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(57) Abstract 

The invention concerns members of the endocytic type C lectin family and methods and means for producing them. The native 
polypeptides of the invention are characterized by containing a signal sequence, a cysteine rich domain, a fibronectin type II domain, 8 
type C lectin domains, a transmembrane domain and a cytoplasmic domain. Nucleotide sequences encoding such polypeptides, vectors 
containing the nucleotide sequences, recombinant host cells transformed with the vectors, and methods for the recombinant production for 
the type C lectins are also within the scope of the invention. 
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TYPE C LECTINS 
Field of tfre Invention 

The present invention concerns novel type C lectins. More particularly, the invention relates to new 
members of the endocytic type C lectin family and functional derivatives of such novel polypeptides. 
5 Background of the Invention 

The recognition of carbohydrates by lectins has been found to play an important role in various aspects 
of eukaryotic physiology. A number of different animal and plant lectin families exist, but it is the calcium 
dependent, or type C, lectins that have recently garnered the most attention. For example, the recognition of 
carbohydrate residues on either endothelial cells or leukocytes by the selectin family of calcium dependent lectins 

1 0 has been found to be of profound importance to the trafficking of leukocytes to inflammatory sites. Lasky, L., 
Ann. Rev. BiQchern., 64 1 13-139 (1995). The biophysical analysis of these adhesive interactions has suggested 
that lectin-carbohydrate binding evolved in this case to allow for the adhesion between leukocytes and the 
endothelium under the high shear conditions of the vasculature. Alon et aL, Nature (1995^ in press. Thus, the 
rapid on rates of carbohydrate recognition by such lectins allows for a hasty acquisition of ligand, a necessity 

15 under the high shear of the vascular flow. The physiological use of type C lectins in this case is also supported 
by the relatively low affinities of these interactions, a requirement for the leukocyte rolling phenomenon that has 
been observed to occur at sites of acute inflammation. The crystal structures of the mannose binding protein 
(Weis etaL, Scienc e 254, 1608-1615 [1991]; Weis et aL Nature 360 127-134 [1992]) and E-selectin (Graves 
et aL, Nature 2f>7(6463), 532-538 [1994]), together with various mutagenesis analyses (Erbe et aL, J. Cell. Biol. 

20 H2( 1), 21 5-227 [1 992]; Drickamer, Mature 260, 183-186 f 19921: lobst et aL. J. Biol. Chem. 169 (22V 15505- 
15511 f 19941: Koean et aL. J. Biol. Chem. 270 (23 V 14047-14055 [1995]), is consistent with the supposition that 
the type C lectins are, in general, involved with the rapid recognition of clustered carbohydrates. Together, these 
data suggest that type C lectins perform a number of critical physiological phenomena through the rapid, 
relatively low affinity recognition of carbohydrates. 

25 While a number of different type C lectin families are known, a particularly unusual group is that 

represented by the macrophage mannose (Taylor et aL. J. Biol. Chem. 265(2 1 ), 1 2 1 56-62 [ 1 990] ; Harris et aL , 
Blood 8Q(9), 2363-73 [1992]), phospholipase A2 (Ishizaki et aL. J. Biol. Chem. 269 (8V 5897-904 [1994]; 
Lambeau et aL. J. Biol. Chem, 269(3 V 1575-8 [1994]; Higashino et aL. Eur. J. Biochem. 225 (1), 375-82 [1994]) 
and DEC 205 (Jiang et a/., Nature 225(6527), 151-5 [1995]) receptors. While most of the members of the type 

30 C lectin group contain only a single carbohydrate binding domain, these three receptors contain either 8 
(macrophage mannose and phospholipase A2 receptors) or 10 (DEC 205 receptor) lectin domains, and it is likely 
that these domains cooperate with each other to enhance ligand avidity (Taylor et aL, J. Biol. Chem. 262(3), 
1719-20 [1992]; Taylor et aL, J. Biol. Chem 26SO), 399-404 [1993]). All three of these molecules appear to 
be type 1 transmembrane proteins, and they all appear to mediate various endocytic phenomena. Accordingly, 

35 this family will hereafter be referred to as the endocytic type C lectin family (Harris et aL, supra; Jiang et aL, 
supra; Zvaritch et aL, J, Biol, Chsm, 221(1 ), 250-7 [1996]). The endocytic mechanism is particularly important 
in the case of the macrophage mannose receptor, expressed predominately on macrophages and liver endothelium 
(Harris et aL, supra), and the DEC 205 receptor (Jiang et aL supra) , expressed specifically on dendritic and 
thymic epithelial cells. Thus, both of these receptors appear to mediate the endocytosis of large particulate (ie. 

40 pathogens such as yeast) (the macrophage mannose receptor) or highly glycosylated molecular (the DEC 205 
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receptor) complexes. In both cases, the endocytosis of glycosylated complexes by these receptors is involved 
with the transport of either particles or glycoproteins to the endosomal pathway where they are degraded and, 
in the case of the DEC 205 receptor, efficiently presented to cells of the immune system by the dendritic or 
thymic epithelial cells (Jiang et al t supra) . It therefore seems likely that both of these receptors are involved with 
5 the presentation of highly glycosylated structures to immune cells to allow for efficient responses against 
pathogenic organisms. Interestingly, the phospholipase A2 receptor is also likely to be involved with the 
endocytic uptake of extracellular proteins, although in this case it appears to be an endogenous protein, ie. one 
or more phospholipases (lshizaki et al, supra; Lambeau et al., supra; Higashino et al, supra; Zvaritch et al, 
supra). The exact biological function of this receptor, other than as a high affinity mediator of phospholipase 
1 0 binding, is unknown, and its tissue expression pattern appears to be far broader than that of the other two 
receptors in this family (Higishino et al> supra). In addition, it is not clear that the binding of phospholipase to 
this receptor is mediated by protein-carbohydrate interactions, although this receptor is clearly capable of binding 
glycosylated proteins (Lambeau et al., supra). In summary, all three of the known members of this family of type 
C lectins appear to be involved with the binding and uptake of either large particulate or molecular complexes 
1 5 into the endocytic pathway of the cell, and in the case of both the macrophage mannose and DEC 205 receptors, 
these interactions appear to be via protein -carbohydrate recognition. 

Summary of the Invention 
The present invention is based on the identification, recombinant production and characterization of 
a novel member of the family of endocytic type C lectins. More specifically, the invention concerns a novel 
20 polypeptide comprising a region which shows a distant ( -23%) homology to a region of the E-selectin lectin 
domain. In analyzing the homologous sequence motif, we have surprisingly found that, despite the low degree 
of homology, the residues that were identical with residues in the E-selectin lectin domain were included in the 
subset of amino acids that are conserved in the vast majority of type C lectins. Based upon this observation and 
further findings which will be described hereinafter, the novel protein has been identified as a new member of 
25 the family of endocytic type C lectins. The novel protein contains domains that are distantly related, but similar 
in overall structure, to those found in the other members of this lectin family. In addition, it appears to be 
expressed specifically in some highly endothelial ized regions of the embryo and adult as well as by actively 
growing and differentiating chondrocytes in the embryo. These data suggest that this lectin represents a novel 
member of the endocytic lectin family that may be involved with the endocytosis of glycosylated complexes by 
30 the endothelium as well as by chondrocytes during cartilage formation. 

In one aspect, the present invention concerns novel isolated mammalian type C lectins closely related 
to the macrophage mannose receptor, the phospholipase A2 receptor and the DEC 205 receptor, all members of 
the family of type C lectins containing multiple lectin domains which mediate endocytosis, and functional 
derivatives of the novel type C lectins. The native polypeptides within the scope of the present invention are 
35 characterized by containing a signal sequence, a cysteine rich domain, a fibronectin type II domain, 8 type C 
lectin domains, a transmembrane domain and a short cytoplasmic domain. The present invention specifically 
includes the soluble forms of the new receptor molecules, which are devoid of an active transmembrane domain 
and optionally of all or part of the cytoplasmic domain. 
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In a particular embodiment, the invention concerns isolated type C lectins selected from the group 



consisting of 



5 



(1) 
(2) 
(3) 



a polypeptide comprising the amino acid sequence shown in Figure 2 (SEQ. ID. NO: 2); 
a polypeptide comprising the amino acid sequence shown in Figure 9 (SEQ. ID. NO: 4); 
a further mammalian homologue of polypeptide (1) or (2); 



(4) a soluble form of any of the polypeptides (1 ) - (3) devoid of an active transmembrane domain; 

and 

(5) a derivative of any of the polypeptides ( 1 ) - (3), retaining the qualitative carbohydrate recognition 
properties of a polypeptide (1), (2) or (3). 

1 0 The native type C lectins of the present invention are glycoproteins. The present invention encompasses 

variant molecules unaccompanied by native glycosylation or having a variant glycosylation pattern. 

In a further embodiment, the invention concerns an antagonist of a novel type C lectin of the present 
invention. 

The invention further concerns a nucleic acid molecule encoding a novel type C lectin of the present 

1 5 invention, vectors containing such nucleic acid, and host cells transformed with the vectors. The nucleic acid 
preferably encodes at least the fibronectin type II domain and the first three lectin domains of a native or variant 
type C lectin of the present invention. The invention further includes nucleic acid hybridizing under stringent 
condition to the complement of a nucleic acid encoding a native type C lectin of the present invention, and 
encoding a protein retaining the qualitative carbohydrate binding properties of a native type C lectin herein. 

20 In another aspect, the invention concerns a process for producing a type C lectin as hereinabove defined, 

which comprises transforming a host cell with nucleic acid encoding the desired type C lectin, culturing the 
transformed host cell and recovering the type C lectin produced from the host cell culture. 

In a further aspect, the invention concerns an antibody capable of specific binding to a type C lectin 
of the present invention, and to a hybridoma cell line producing such antibody. 

25 In a still further aspect, the invention concerns an immunoadhesin comprising a novel type C lectin 

sequence as hereinabove described fused to an immunoglobulin sequence. The type C lectin sequence is 
preferably a transmembrane-domain deleted form of a native or variant polypeptide fused to an immunoglobulin 
constant domain sequence, and comprises at least the fibronectin type II domain and a carbohydrate recognition 
(lectin) domain of a native type C lectin of the present invention. In another preferred embodiment, the type C 

30 lectin sequence present in the immunoadhesin shows at least about 80% sequence homology with the fibronectin 
type II domain and/or with at least one of the first three carbohydrate recognition domains of a native type C 
lectin of the present invention. The immunoglobulin constant domain sequence preferably is that of an IgG-1, 
IgG-2 or IgG-3 molecule. 

The invention further concerns pharmaceutical compositions comprising a type C lectin as hereinabove 

35 defined in admixture with a pharmaceutical ly acceptable carrier. 

Brief Description of the Drawings 
Figure L Sequence homology between the E-selectin lectin domain and an EST. Shown is the 
homologous sequence (Tl 1885) (SEQ. ID. NO: 9) derived from a search of the expressed sequence tag (EST) 
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database with the E-selectin lectin domain (SEQ. ID. NO: 8). The region of homology was found within amino 
acids 10-67 of the E-selectin lectin domain. 

Figure 2. The DNA and derived protein sequence of the cDNA encoding the E-selectin 
homologous murine sequence. Illustrated is the entire DNA sequence (SEQ. ID. NO: 1) and derived protein 
5 sequence (SEQ. ID. NO: 2) of the murine cDNA clones and RACE products derived using the Tl 1885 DNA 
sequence as a probe. The region homologous to the original EST stretches from amino acids 995 to 1,061. 

Figure 3. Protein homologies between the novel type C lectin (SEQ. ID. NO: 2), the macrophage 
mannose receptor (SEQ. ID. NO: 5), the phospholipase A2 receptor (SEQ. ID. NO: 7) and the DEC 205 
receptor (SEQ. ID. NO: 6). Illustrated are the conserved residues in the three members of the endocytic type 
1 0 C lectin family (boxed). Overlined are shown the signal sequence, cysteine rich, fibronectin type II, type C lectin, 
transmembrane and cytoplasmic domains. The ninth and tenth type C lectin domains of the DEC 205 receptor 
were deleted to allow for a clearer alignment. 

Figure 4. Domain homologies and relative percent conservation between the novel lectin, the 
macrophage mannose receptor, the phospholipase A2 receptor and the DEC 205 receptor. Illustrated are 
1 5 the various domains and the percent conservation between these domains in the novel type C lectin and the other 
three members of the endocytic type C lectin family. The domains are as follows: Cys-rich: cysteine rich, Fn II: 
fibronectin type 2, CRD: carbohydrate recognition domain (type C lectin), TM: transmembrane, CYTO: 
cytoplasmic. 

Figure 5. Genomic blot probed with the novel receptor cDNA and the genomic structure of the 
20 gene encoding the novel receptor. A. A "zoo blot" containing genomic DNAs isolated from various organisms 
and digested with EcoRl was probed with the original EST fragment isolated by PCR from the heart library. B. 
The top of the figure illustrates the domain structure of the novel type C lectin and the approximate sites 
determined by dot blotting and per analysis for each intron (arrowheads). Below is shown the genomic locus 
with each exon defined as a small box. 
25 Figure 6. Northern blot analysis of human and murine tissues and cell lines for expression of the 

transcript encoding the novel type C lectin. A. A commercial northern blot containing either whole murine 
fetal RNA (left panel) or RNA derived from adult murine tissues was probed with the original EST derived 
fragment isolated from the murine heart cDNA library. B. A commercial northern blot containing RNA isolated 
from various adult or fetal human tissues was probed with the original EST derived from the human heart cDN A 

30 library. C. A commercial blot containing RNA isolated from: a. promyelocyte leukemia-HL-60, b. Hela cell-S3, 
c chronic myelogenous leukemia-K-562, d. lymphoblastic leukemia-MOLT-4, e. Burkitt's lymphoma-Raji, f. 
colorectal adenocarcinoma-S W480, g. lung carcinoma-A549 and h. melanoma-G361 human tumor cell lines was 
probed with the original EST derived from the human heart cDNA library. 

Figure 7. Characterization of the 5 prime region of the alternatively spliced human fetal liver 

35 transcript. The sequence illustrates that the human full length (MRX) and alternately spliced (FL) transcript 
were identical from the region 3 prime to nucleotide 61 of the alternately spliced fetal liver clone. The top part 
of the figure illustrates PCR analysis using two 5 prime primers specific for either the full length transcript 
(primer 1) (SEQ. ID. NO: 12) or the alternately spliced transcript (primer 2) (SEQ. ID. NO: 13). The 3 prime 
PCR primer is shown at the end of the sequence and is identical in both cases (SEQ. ID. NO: 14). An internal 
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oligonucleotide probe used for hybridization is shown as the middle primer and is also identical for both 
sequences (SEQ. ID. NO: 15). 1 or 2 in the top panels refer to the 5 prime primers utilized for the PCR reaction 
for each tissue. The panels illustrate that the smaller PCR fragment (2) corresponds to the alternately spliced 
transcript, and it is found only in the fetal liver and not in the lung or heart. 
5 Figure 8. In situ hybridization analysis of neonatal and embryonic tissues with the novel type C 

lectin. A. Lung hybridized with antisense probe, B. Lung hybridized with sense probe, C. Kidney glomerulus 
hybridized with antisense probe, D. Choroid plexus hybridized with antisense probe, E. Developing sternum 
hybridized with antisense probe, F. Developing sternum hybridized with sense probe. G. Developing tooth 
hybridized with antisense probe, H. Developing cartilage of the larynx hybridized with antisense probe. 
10 Figure 9. The protein sequence of the novel human type C lectin (SEQ. ID. NO: 4). 

Detailed Description of the Invention 

A. Definitions 

The phrases "novel type C lectin" and "novel endocytic type C lectin" are used interchangeably and 
refer to new native members of the family of endocytic type C lectins, which are expressed specifically in some 

1 5 highly endothelial ized regions of the embryo and adults, and in actively growing and differentiating chondrocytes 
in the embryo, and to functional derivatives of such native polypeptides. 

The terms "native (novel) endocytic type C lectin" and "native (novel) type C lectin" in this context refer 
to novel naturally occurring endocytic type C lectin receptors, comprising a cysteine rich domain, a flbronectin 
type II domain, multiple type C lectin domains, a transmembrane domain and a cytoplasmic domain, with or 

20 without a native signal sequence, and naturally occurring soluble forms of such type C lectin receptors, with or 
without the initiating methionine, whether purified from native source, synthesized, produced by recombinant 
DNA technology or by any combination of these and/or other methods. The native type C lectins of the present 
invention specifically include the murine type C lectin, the amino acid sequence of which is shown in Figure 2 
(SEQ. ID. NO: 2), and the human type C lectin having the amino acid sequence shown in Figure 9 (SEQ. ID. 

25 NO: 4), and further mammalian homologues of these native receptors. The novel native murine and human type 
C lectins of the present invention are about 1480 amino acids in length, and comprise a signal sequence (amino 
acids 1-36), a cysteine-rich domain ( from about amino acid position 37 to about amino acid position 174), a 
flbronectin type II domain (from about amino acid position 175 to about amino acid positions 229), eight 
carbohydrate recognition (lectin) domains (CRDs) (CRD1: about aa 234-360; CRD2: about aa 381-507; CDR3: 

30 about aa 520-645; CRD4: about aa 667-809; CRD5: about aa 824-95 1 ; CRD6: about aa 970-1 108; CRD7: about 
aa 1 1 10-1243; CRD8: about aa 1259-1393); a transmembrane domain (from about amino acid position 1410 to 
about amino acid position 1434); and a cytoplasmic domain, extending to the C-terminus of the molecule. The 
boundaries of these domain are indicated in Figure 3 for the novel murine type C lectin sequence. 

The terms "soluble form", "soluble receptor", "soluble type C lectin", "soluble endocytic type C lectin", 

35 and grammatical variants thereof, refer to variants of the native or variant type C lectins of the present invention 
which are devoid of a functional transmembrane domain. In the soluble receptors the transmembrane domain 
may be deleted, truncated or otherwise inactivated such that they are not capable of cell membrane anchorage. 
If desired, such soluble forms of the type C lectins of the present invention might additionally have their 
cytoplasmic domains fully or partially deleted or otherwise inactivated. 



WO 97/40154 PCT/US97/06347 

A "functional derivative" of a polypeptide is a compound having a qualitative biological activity in 
common with the native polypeptide. Thus, a functional derivative of a native novel type C lectin of the present 
invention is a compound that has a qualitative biological activity in common with such native lectin. "Functional 
derivatives" include, but are not limited to, fragments of native polypeptides from any animal species (including 

5 humans), derivatives of native (human and non-human) polypeptides and their fragments, and peptide and non- 
peptide analogs of native polypeptides, provided that they have a biological activity in common with a respective 
native polypeptide. "Fragments" comprise regions within the sequence of a mature native polypeptide. The term 
"derivative" is used to define amino acid sequence and glycosylation variants, and covalent modifications of a 
native polypeptide. "Non-peptide analogs" are organic compounds which display substantially the same surface 

10 as peptide analogs of the native polypeptides. Thus, the non-peptide analogs of the native novel type C lectins 
of the present invention are organic compounds which display substantially the same surface as peptide analogs 
of the native type C lectins. Such compounds interact with other molecules in a similar fashion as the peptide 
analogs, and mimic a biological activity of a native type C lectin of the present invention. Preferably, amino acid 
sequence variants of the present invention retain at least one domain or a native type C lectin, or have at least 

1 5 about 60% amino acid sequence identity, more preferably at least about 70 % amino acid sequence identity, even 
more preferably at least about 80% amino acid sequence identity, most preferably at least about 90% amino acid 
sequence identity with a domain of a native type C lectin of the present invention. The amino acid sequence 
variants preferably show the highest degree of amino acid sequence homology with the fibronectin type II or the 
lectin-like domain(s), preferably the first three lectin-like (carbohydrate-binding) domains of native type C lectins 

20 of the present invention. These are the domains which show the highest percentage amino acid conservation 
between the novel type C lectins of the present invention and other members of the endocytic type C lectin family 
(Figure 4). 

The terms "covalent modification" and "covalent derivatives" are used interchangeably and include, 
but are not limited to, modifications of a native polypeptide or a fragment thereof with an organic proteinaceous 

25 or non-proteinaceous derivatizing agent, fusions to heterologous polypeptide sequences, and post-translational 
modifications. Covalent modifications are traditionally introduced by reacting targeted amino acid residues with 
an organic derivatizing agent that is capable of reacting with selected sides or terminal residues, or by harnessing 
mechanisms of post-translational modifications that function in selected recombinant host cells. Certain post- 
translational modifications are the result of the action of recombinant host cells on the expressed polypeptide. 

30 Glutaminyl and asparaginyl residues are frequently post-translationally deamidated to the corresponding glutamyl 
and aspartyl residues. Alternatively, these residues are deamidated under mildly acidic conditions. Other post- 
translational modifications include hydroxylation of proline and lysine, phosphorylation of hydroxyl groups of 
seryl, tyrosine or threonyl residues, methylation of the a -amino groups of lysine, arginine, and histidine side 
chains [T.E. Creighton, Proteins: Stru cture and Molecular Properties. W.H. Freeman & Co., San Francisco, pp. 

35 79-86 (1983)]. Covalent derivatives/modifications specifically include fusion proteins comprising native type 
C lectin sequences of the present invention and their amino acid sequence variants, such as immunoadhesins, and 
N-terminal fusions to heterologous signal sequences. 

The term "biological activity" in the context of the present invention is defined as the possession of at 
least one adhesive, regulatory or effector function qualitatively in common with a native polypeptide. Preferred 
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functional derivatives within the scope of the present invention are unified by retaining the qualitative 
carbohydrate recognition properties of a native endocytic type C lectin of the present invention. 

"Identity" or "homology" with respect to a native polypeptide and its functional derivative is defined 
herein as the percentage of amino acid residues in the candidate sequence that are identical with the residues of 
5 a corresponding native polypeptide, after aligning the sequences and introducing gaps, if necessary, to achieve 
the maximum percent homology, and not considering any conservative substitutions as part of the sequence 
identity. Neither N- or C-terminal extensions nor insertions shall be construed as reducing identity or homology. 
Methods and computer programs for the alignment are well known in the art. 

The term "agonist" is used to refer to peptide and non-peptide analogs of the native type C lectins of 
1 0 the present invention and to antibodies specifically binding such native type C lectins provided that they retain 
at least one biological activity of a native type C lectin. Preferably, the agonists of the present invention retain 
the qualitative carbohydrate recognition properties of the native type C lectin polypeptides. 

The term "antagonist" is used to refer to a molecule inhibiting a biological activity of a native type C 
lectin of the present invention. Preferably, the antagonists herein inhibit the carbohydrate-binding of a native 
1 5 type C lectin of the present invention. Preferred antagonists essentially completely block the binding of a native 
type C lectin to a carbohydrate structure to which it otherwise binds. 

Ordinarily, the terms "amino acid" and "amino acids" refer to all naturally occurring L-a-amino acids. 
In some embodiments, however, D-amino acids may be present in the polypeptides or peptides of the present 
invention in order to facilitate conformational restriction. For example, in order to facilitate disulfide bond 
20 formation and stability, a D amino acid cysteine may be provided at one or both termini of a peptide functional 
derivative or peptide antagonist of the native type C lectins of the present invention. The amino acids are 
identified by either the single-letter or three-letter designations: 



Asp 


D 


aspartic acid 


lie 


I 


isoleucine 


Thr 


T 


threonine 


Leu 


L 


leucine 


Ser 


S 


serine 


Tyr 


Y 


tyrosine 


Glu 


E 


glutamic acid 


Phe 


F 


phenylalanine 


Pro 


P 


proline 


His 


H 


histidine 


Gly 


G 


glycine 


Lys 


K 


lysine 


Ala 


A 


alanine 


Arg 


R 


arginine 


Cys 


C 


cysteine 


Trp 


W 


tryptophan 


Val 


V 


valine 


Gin 


Q 
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The term "amino acid sequence variant" refers to molecules with some differences in their amino acid 
sequences as compared to a native amino acid sequence. 

Substitutional variants are those that have at least one amino acid residue in a native sequence removed 
and a different amino acid inserted in its place at the same position. 

Insertional variants are those with one or more amino acids inserted immediately adjacent to an amino 
acid at a particular position in a native sequence. Immediately adjacent to an amino acid means connected to 
either the a-carboxy or oc-amino functional group of the amino acid. 
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Deletional variants are those with one or more amino acids in the native amino acid sequence removed. 

"Antibodies (Abs)" and "immunoglobulins (Igs)" are glycoproteins having the same structural 
characteristics. While antibodies exhibit binding specificity to a specific antigen, immunoglobulins include both 
antibodies and other antibody-like molecules which lack antigen specificity. Polypeptides of the latter kind are, 
5 for example, produced at low levels by the lymph system and at increased levels by myelomas. 

Native antibodies and immunoglobulins are usually heterotetrameric glycoproteins of about 150,000 
daltons, composed of two identical light (L) chains and two identical heavy (H) chains. Each light chain is linked 
to a heavy chain by one covalent disulfide bond, while the number of disulfide linkages varies between the heavy 
chains of different immunoglobulin isotypes. Each heavy and light chain also has regularly spaced intrachain 
1 0 disulfide bridges. Each heavy chain has at one end a variable domain (V H ) followed by a number of constant 
domains. Each light chain has a variable domain at one and (V L ) and a constant domain at its other end; the 
constant domain of the light chain is aligned with the first constant domain of the heavy chain, and the light chain 
variable domain is aligned with the variable domain of the heavy chain. Particular amino acid residues are 
believed to form an interface between the light and heavy chain variable domains (Clothia et al. y J. Mol BlQl 
15 1M, 651-663 [1985]; Novotny and Haher froc Natl. Acad. Sci. USA 82. 4592-4596 [1985]). 

The light chains of antibodies (immunoglobulins) from any vertebrate species can be assigned to one 
of two clearly distinct types, called kappa and lambda (A), based on the amino acid sequences of their constant 
domains. 

Depending on the amino acid sequence of the constant domain of their heavy chains, immunoglobulins 

20 can be assigned to different classes. There are five major classes of immunoglobulins: IgA, IgD, IgE, IgG and 
IgM, and several of these may be further divided into subclasses (isotypes), e.g. IgG-l, IgG-2, IgG-3, and IgG-4; 
IgA-l and lgA-2. The heavy chain constant domains that correspond to the different classes of immunoglobulins 
are called a, delta, epsilon, y, and u, respectively. The subunit structures and three-dimensional configurations 
of different classes of immunoglobulins are well known. 

25 The term "antibody" is used in the broadest sense and specifically covers single monoclonal antibodies 

(including agonist and antagonist antibodies), antibody compositions with poiyepitopic specificity, as well as 
antibody fragments (e.g., Fab, F(ab , ) 2 , and Fv), so long as they exhibit the desired biological activity. 

The term "monoclonal antibody" as used herein refers to an antibody obtained from a population of 
substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical 

30 except for possible naturally occurring mutations that may be present in minor amounts. The modifier 
"monoclonal" indicates the character of the antibody as being obtained from a substantially homogeneous 
population of antibodies, and is not to be construed as requiring production of the antibody by any particular 
method. For example, the monoclonal antibodies to be used in accordance with the present invention may be 
made by the hybridoma method first described by Kohler & Milstein, Nature 216:495 (1975), or may be made 

35 by recombinant DNA methods [see, e.g. U.S. Patent No. 4,816,567 (Cabilly et a/.)]. 

The monoclonal antibodies herein specifically include "chimeric" antibodies (immunoglobulins) in 
which a portion of the heavy and/or light chain is identical with or homologous to corresponding sequences in 
antibodies derived from a particular species or belonging to a particular antibody class or subclass, while the 
remainder of the chain(s) is identical with or homologous to corresponding sequences in antibodies derived from 
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another species or belonging to another antibody class or subclass, as well as fragments of such antibodies, so 
long as they exhibit the desired biological activity (U.S. Patent No. 4,816,567 (Cabilly et al\ Morrison et ai, 
Proc. Natl. Acad. Sci. USA 81. 6851-6855 [1984]). 

"Humanized" forms of non-human (e.g. murine) antibodies are chimeric immunoglobulins, 
5 immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab')2 or other antigen-binding 
subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. For 
the most part, humanized antibodies are human immunoglobulins (recipient antibody) in which residues from 
a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non- 
human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. 

1 0 In some instances, Fv framework residues of the human immunoglobulin are replaced by corresponding non- 
human residues. Furthermore, humanized antibody may comprise residues which are found neither in the 
recipient antibody nor in the imported CDR or framework sequences. These modifications are made to further 
refine and optimize antibody performance. In general, the humanized antibody will comprise substantially all 
of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions 

1 5 correspond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of 
a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at least 
a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin. For further 
details see: Jones et aL, Nature 321 . 522-525 [1986]; Reichmann et ai, Nature 332 . 323-329 [ 1 988]; EP-B-239 
400 published 30 September 1987; Presta, Curr. Op. Struct. Biol. 2 593-596 [1992]; and EP-B-451 216 

20 published 24 January 1996). 

In the context of the present invention the expressions "cell", "cell line", and "cell culture" are used 
interchangeably, and all such designations include progeny. It is also understood that all progeny may not be 
precisely identical in DNA content, due to deliberate or inadvertent mutations. Mutant progeny that have the 
same function or biological property, as screened for in the originally transformed cell, are included. 

25 The terms "replicable expression vector", "expression vector" and "vector" refer to a piece of DNA, 

usually double-stranded, which may have inserted into it a piece of foreign DNA. Foreign DNA is defined as 
heterologous DNA, which is DNA not naturally found in the host cell. The vector is used to transport the foreign 
or heterologous DNA into a suitable host cell. Once in the host cell, the vector can replicate independently of 
the host chromosomal DNA, and several copies of the vector and its inserted (foreign) DNA may be generated. 

30 In addition, the vector contains the necessary elements that permit translating the foreign DNA into a 
polypeptide. Many molecules of the polypeptide encoded by the foreign DNA can thus be rapidly synthesized. 

The term "control sequences" refers to DNA sequences necessary for the expression of an operably 
linked coding sequence in a particular host organism. The control sequences that are suitable for prokaryotes, 
for example, include a promoter, optionally an operator sequence, a ribosome binding site, and possibly, other 

35 as yet poorly understood sequences. Eukaryotic cells are known to utilize promoters, polyadenylation signals, 
and enhancer. 

Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic 
acid sequence. For example, DNA for a presequence or a secretory leader is operably linked to DNA for a 
polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide; a promoter or 
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enhancer is operably linked to a coding sequence if it affects the transcription of the sequence; or a ribosome 
binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, 
"operably linked" means that the DNA sequences being linked are contiguous and, in the case of a secretory 
leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. Linking is 

5 accomplished by ligation at convenient restriction sites. If such sites do not exist, then synthetic oligonucleotide 
adaptors or linkers are used in accord with conventional practice. 

"Oligonucleotides" are short-length, single- or double-stranded polydeoxynucleotides that are 
chemically synthesized by known methods [such as phosphotriester, phosphite, or phosphoramidite chemistry, 
using solid phase techniques such as those described in EP 266,032, published 4 May 1988, or via 

1 0 deoxynucleoside H-phosphanate intermediates as described by Froehler et at. , NucL Acids Res. 14, 5399 ( 1 986). 
They are then purified on poly aery lam ide gels. 

Hybridization is preferably performed under "stringent conditions" which means (1) employing low 
ionic strength and high temperature for washing, for example, 0.015 sodium chloride/0.0015 M sodium 
citrate/0.1% sodium dodecyl sulfate at 50° C, or (2) employing during hybridization a denaturing agent, such as 

15 formamide, for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% 
polyvinylpyrrolidone/50 nM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride, 75 mM sodium 
citrate at 42° C. Another example is use of 50% formamide, 5 x SSC (0.75 M NaCl, 0.075 M sodium citrate), 
50 mM sodium phosphate (pH 6/8), 0.1% sodium pyrophosphate, 5 x Denhardt's solution, sonicated salmon 
sperm DNA (50 ug/ml), 0.1% SDS, and 10% dextran sulfate at 42° C, with washes at 42° C in 0.2 x SSC and 

20 0.1% SDS. Yet another example is hybridization using a buffer of 10% dextran sulfate, 2 x SSC (sodium 
chloride/sodium citrate) and 50% formamide at 55 °C, followed by a high- stringency wash consisting of 0.1 x 
SSC containing EDTA at 55 °C. 

"Immunoadhesins" or "type C lectin - immunoglobulin chimeras" are chimeric antibody-like molecules 
that combine the functional domain(s) of a binding protein (usually a receptor, a cell-adhesion molecule or a 

25 ligand) with the an immunoglobulin sequence. The most common example of this type of fusion protein 
combines the hinge and Fc regions of an immunoglobulin (Ig) with domains of a cell-surface receptor that 
recognizes a specific ligand. This type of molecule is called an "immunoadhesin", because it combines 
"immune" and "adhesion" functions; other frequently used names are "lg-chimera", "Ig-" or "Fc-fusion protein", 
or "receptor-globulin." 

30 B. Production of the novel type C lectins by recombinant DNA technology 

1. Identification and isolation of nucleic acid encoding the novel type C 

lectins 

The native endocytic type C lectins of the present invention may be isolated from cDNA or genomic 
libraries. For example, a suitable cDNA library can be constructed by obtaining polyadenylated mRNA from 

3 5 cells known to express the desired type C lectin, and using the mRNA as a template to synthesize double stranded 
cDNA. Suitable sources of the mRNA are highly endothelialized regions of embryonic and adult mammalian 
tissues, and differentiating chondrocytes in the embryo. mRNA encoding native type C lectins of the present 
invention is expressed, for example, in human fetal lung, kidney, and liver tissues; adult murine heart, lung, 
kidney, brain, and muscle tissues; adult human heart, prostate, testis, ovary, intestine, brain, placenta, lung, 

40 kidney, pancrease, spleen, thymus and colon tissues. The gene encoding the novel type C lectins of the present 
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invention can also be obtained from a genomic library, such as a human genomic cosmid library, or a mouse- 
derived embryonic cell (ES) genomic library. 

Libraries, either cDNA or genomic, are then screened with probes designed to identify the gene of 
interest or the protein encoded by it. For cDNA expression libraries, suitable probes include monoclonal and 
5 polyclonal antibodies that recognize and specifically bind to a type C lectin receptor. For cDNA libraries, 
suitable probes include carefully selected oligonucleotide probes (usually of about 20-80 bases in length) that 
encode known or suspected portions of a type C lectin polypeptide from the same or different species, and/or 
complementary or homologous cDN As or fragments thereof that encode the same or a similar gene. Appropriate 
probes for screening genomic DNA libraries include, without limitation, oligonucleotides, cDNAs, or fragments 
1 0 thereof that encode the same or a similar gene, and/or homologous genomic DNAs or fragments thereof. 
Screening the cDNA or genomic library with the selected probe may be conducted using standard procedures 
as described in Chapters 10-12 of Sambrook et aL, Molecular Cloning: A Laboratory Manual . New York, Cold 
Spring Harbor Laboratory Press, 1989. 

If DNA encoding an enzyme of the present invention is isolated by using carefully selected 
1 5 oligonucleotide sequences to screen cDNA libraries from various tissues, the oligonucleotide sequences selected 
as probes should be sufficient in length and sufficiently unambiguous that false positives are minimized. The 
actual nucleotide sequence(s) is/are usually designed based on regions which have the least codon redundance. 
The oligonucleotides may be degenerate at one or more positions. The use of degenerate oligonucleotides is of 
particular importance where a library is screened from a species in which preferential codon usage is not known. 
20 The oligonucleotide must be labeled such that it can be detected upon hybridization to DNA in the 

library being screened. The preferred method of labeling is to use ATP (e.g., v 32 P) and polynucleotide kinase 
to radiolabel the 5' end of the oligonucleotide. However, other methods may be used to label the oligonucleotide, 
including, but not limited to, biotinylation or enzyme labeling. 

cDNAs encoding the novel type C lectins can also be identified and isolated by other known techniques 
25 of recombinant DNA technology, such as by direct expression cloning, or by using the polymerase chain reaction 
(PCR) as described in U.S. Patent No. 4,683,195, issued 28 July 1987, in section 14 of Sambrook et aL, supra, 
or in Chapter 15 of Current Protocols in Molecular Biology. Ausubel et aL eds., Greene Publishing Associates 
and Wiley-Interscience 1991 . The use of the PCR technique to amplify a human heart and a mouse heart cDNA 
library is described in the examples. 
30 Once cDNA encoding a new native endocytic type C lectin from one species has been isolated, cDNAs 

from other species can also be obtained by cross-species hybridization. According to this approach, human or 
other mammalian cDNA or genomic libraries are probed by labeled oligonucleotide sequences selected from 
known type C lectin sequences (such as murine or human sequences) in accord with known criteria, among which 
is that the sequence should be sufficient in length and sufficiently unambiguous that false positives are 
35 minimized. Typically, a ^P-labeled oligonucleotide having about 30 to 50 bases is sufficient, particularly if 
the oligonucleotide contains one or more codons for methionine or tryptophan. Isolated nucleic acid will be 
DNA that is identified and separated from contaminant nucleic acid encoding other polypeptides from the source 
of nucleic acid. Hybridization is preferably performed under "stringent conditions", as hereinabove defined. 
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Once the sequence is known, the gene encoding a particular type C lectin can also be obtained by 
chemical synthesis, following one of the methods described in Engels and Uhlmann, Agnsw T Chcm, Int, E<3- Engl 
2S, 716 (1989). These methods include triester, phosphite, phosphoramidite and H-phosphonate methods, PCR 
and other autoprimer methods, and oligonucleotide syntheses on solid supports. 
5 2. Cloning and expression of nucleic acid encoding the novel type C 

lectins 

Once the nucleic acid encoding a novel type C lectin is available, it is generally ligated into a replicable 
expression vector for further cloning (amplification of the DNA), or for expression. 

Expression and cloning vectors are well known in the art and contain a nucleic acid sequence that 
10 enables the vector to replicate in one or more selected host cells. The selection of the appropriate vector will 
depend on 1) whether it is to be used for DNA amplification or for DNA expression, 2) the size of the DNA to 
be inserted into the vector, and 3) the host cell to be transformed with the vector. Each vector contains various 
components depending on its function (amplification of DNA of expression of DNA) and the host cell for which 
it is compatible. The vector components generally include, but are not limited to, one or more of the following: 
1 5 a signal sequence, an origin of replication, one or more marker genes, an enhancer element, a promoter, and a 
transcription termination sequence. Construction of suitable vectors containing one or more of the above listed 
components, the desired coding and control sequences, employs standard ligation techniques. Isolated plasmids 
or DNA fragments are cleaved, tailored, and religated in the form desired to generate the plasmids required. For 
analysis to confirm correct sequences in plasmids constructed, the ligation mixtures are commonly used to 
20 transform E. coli cells, e.g. E. coli K12 strain 294 (ATCC 3 1 ,446) and successful transformants selected by 
ampicillin or tetracycline resistance where appropriate. Plasmids from the transformants are prepared, analyzed 
by restriction endonuclease digestion, and/or sequenced by the method of Messing et al, Nucleic Apjds Reg, 2, 
309 (1981) or by the method of Maxam et nl Methods in Enzvmologv 65. 499 (1980). 

The polypeptides of the present invention may be expressed in a variety of prokaryotic and eukaryotic 
25 host cells. Suitable prokaryotes include gram negative or gram positive organisms, for example £. c_oJi or bacilli. 
A preferred cloning host is £• 294 (ATCC 31,446) although other gram negative or gram positive 
prokaryotes such as & £oii B, £. fioji X1776 (ATCC 31,537), £. soli W31 10 (ATCC 27,325), Pseudomonas 
species, or Serratia Marcesans are suitable. 

In addition to prokaryotes, eukaryotic microbes such as filamentous fungi or yeast are suitable hosts for 
30 vectors herein. Saccharomvces cerevisiae . or common baker's yeast, is the most commonly used among lower 
eukaryotic host microorganisms. However, a number of other genera, species and strains are commonly available 
and useful herein, such as S. pombe [Beach and Nurse, Nature 22Q, 140 (1981)], KluYVeromycgs jactjs 
[Louvencourt et al , J. Bacteriol. 737 ( 1 983)] ; varrowia (EP 402,226); Pichjapa^pris (EP 1 83,070), Triphoderma 
reesia (EP 244,234), Nenrosnora crassa [Case et al % Prpc Nat), A^d, Sv'. USA 26, 5259-5263 (1979)]; and 
35 Aspergillus hosts such as A. nidulans [Ballance et al, Biochem. Bionhvs. Res. Commun. ill, 284-289 (1983); 
Tilbume/a/.,Qeji£26, 205-221 (1983); v»ltnn e t m Pmc Natl. Acad. Sci. USA 81. 1470-1474(1984)] and 
A. niger [Kelly and Hynes, EMSQA 4, 475-479 (1985)]. 

Suitable host cells may also derive from multicellular organisms. Such host cells are capable of 
complex processing and glycosylation activities. In principle, any higher eukaryotic cell culture is workable, 
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whether from vertebrate or invertebrate culture, although cells from mammals such as humans are preferred. 
Examples of invertebrate cells include plants and insect cells. Numerous baculo viral strains and variants and 
corresponding permissive insect host cells from hosts such as Spodoptera frugiperda (caterpillar), Aedes aegypti 
(mosquito), Aedes albopictus (mosquito), Drosophiia melangaster (fruitfly), and Bombvx mori host cells have 

5 been identified. See, e.g. Luckow et al , Bio/Technolog v £, 47-55 ( 1 988); Miller et al , in Genetic Engineering. 
Setlow, J.K. etal, eds., Vol. 8 (Plenum Publishing, 1986), pp. 277-279; and Maeda et al , Nature. 211, 592-594 
(1985). A variety of such viral strains are publicly available, e.g. the L- 1 variant of Autographa californica NPV, 
and such viruses may be used as the virus herein according to the present invention, particularly for transfection 
of Spodoptera frugiperda cells. 

1 0 Plant cell cultures of cotton, corn, potato, soybean, petunia, tomato, and tobacco can be utilized as hosts. 

Typically, plant cells are transfected by incubation with certain strains of the bacterium Agrobacterium 
tumefaciens . which has been previously manipulated to contain the type C lectin DNA. During incubation of 
the plant cell culture with A. tumefaciens . the DNA encoding a type C lectin is transferred to the plant cell host 
such that it is transfected, and will, under appropriate conditions, express the type C lectin DNA. In addition, 

1 5 regulatory and signal sequences compatible with plant cells are available, such as the nopaline synthase promoter 
and polyadenylation signal sequences. Depicker et al, J. Mol. Appl. Gen. 1, 561 (1982). In addition, DNA 
segments isolated from the upstream region of the T-DNA 780 gene are capable of activating or increasing 
transcription levels of plant-expressible genes in recombinant DNA-containing plant tissue. See EP 321,196 
published 21 June 1989. 

20 However, interest has been greatest in vertebrate cells, and propagation of vertebrate cells in culture 

(tissue culture) is pec ge well known. See Tissue Culture . Academic Press, Kruse and Patterson, editors ( 1 973). 
Examples of useful mammalian host cell lines are monkey kidney CV 1 line transformed by SV40 (COS-7, ATCC 
CRL 1651); human embryonic kidney cell line [293 or 293 cells subcloned for growth in suspension culture, 
Graham et al, J. Gen. Virol. 26, 59 (1977)]; baby hamster kidney cells (BHK, ATCC CCL 10); Chinese hamster 

25 ovary cellsADHFR [CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA 77, 42 1 6 ( 1 980)]; mouse sertolli cells 
[TM4, Mather. Biol. Reprod. 23. 243-251 (1980)]; monkey kidney cells (CV1 ATCC CCL 70); African green 
monkey kidney cells (VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, ATCC CCL 2); 
canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 1442); human lung 
cells (W138, ATCC CCL75); human liver cells (Hep G2, HB 8065); mouse mammary tumor (MMT 060562, 

30 ATCC CCL51); TRI cells [Mather et al, Annals N Y. Acad. Sci. 383 . 44068 (1982)]; MRC 5 cells; FS4 cells; 
and a human hepatoma cell line (Hep G2). Preferred host cells are human embryonic kidney 293 and Chinese 
hamster ovary cells. 

Particularly useful in the practice of this invention are expression vectors that provide for the transient 
expression in mammalian cells of DNA encoding a novel type C lectin herein. In general, transient expression 
35 involves the use of an expression vector that is able to replicate efficiently in a host cell, such that the host cell 
accumulates many copies of the expression vector and, in turn, synthesizes high levels of a desired polypeptide 
encoded by the expression vector. Transient systems, comprising a suitable expression vector and a host cell, 
allow for the convenient positive identification of polypeptides encoded by clones DNAs, as well as for the rapid 
screening of such polypeptides for desired biological or physiological properties. Thus, transient expression 
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systems are particularly useful in the invention for purposes of identifying analogs and variants of a native type 
C lectin herein. 

Other methods, vectors, and host cells suitable for adaptation to the synthesis of the type C lectins in 
recombinant vertebrate cell culture are described in Getting et aL, Nature 293 . 620-625 (1981); Mantel et ai> 
5 Nature 281 . 40-46 (1979); Levinson et al.; EP 1 17,060 and EP 1 17,058. Particularly useful plasmids for 
mammalian cell culture expression of the type C lectin polypeptides are pRK5 (EP 307,247), or pSVI6B (PCT 
Publication No. WO 91/08291). 

Other cloning and expression vectors suitable for the expression of the type C lectins of the present 
invention in a variety of host cells are, for example, described in EP 457,758 published 27 November 1991 . A 
10 large variety of expression vectors is now commercially available. An exemplary commercial yeast expression 
vector is pPIC.9 (Invitrogen), while an commercially available expression vector suitable for transformation of 
E. coli cells is PET15b (Novagen). 

C. Culturing the Host Cells 

Prokaryote cells used to produced the type C lectins of this invention are cultured in suitable media as 

1 5 describe generally in Sambrook et al, supra . 

Mammalian cells can be cultured in a variety of media. Commercially available media such as Ham's 
F10 (Sigma), Minimal Essential Medium (MEM, Sigma), RPMI-1640 (Sigma), and Dulbecco's Modified Eagle's 
Medium (DMEM, Sigma) are suitable for culturing the host cells. In addition, any of the media described in 
Ham and Wallace, Meth. Enzvmol. i£, 44 (1979); Barnes and Sato, Anal. Biochem. 1Q2, 255 (1980), US 

20 4,767,704; 4,657,866; 4,927,762; or 4,560,655; WO 90/03430; WO 87/00195 or US Pat. Re. 30,985 may be 
used as culture media for the host cells. Any of these media may be supplemented as necessary with hormones 
and/or other growth factors (such as insulin, transferrin, or epidermal growth factor), salts (such as sodium 
chloride, calcium, magnesium, and phosphate), buffers (such as HEPES), nucleosides (such as adenosine and 
thymidine), antibiotics (such as Gentamycin™ drug) trace elements (defined as inorganic compounds usually 

25 present at final concentrations in the micromolar range), and glucose or an equivalent energy source. Any other 
necessary supplements may also be included at appropriate concentrations that would be known to those skilled 
in the art. The culture conditions, such as temperature, pH and the like, suitably are those previously used with 
the host cell selected for cloning or expression, as the case may be, and will be apparent to the ordinary artisan. 

The host cells referred to in this disclosure encompass cells in in vitro cell culture as well as cells that 

30 are within a host animal or plant. 

It is further envisioned that the type C lectins of this invention may be produced by homologous 
recombination, or with recombinant production methods utilizing control elements introduced into cells already 
containing DN A encoding the particular type C lectin. 

D. Detecting Gene Amplification/Expression 

35 Gene amplification and/or expression may be measured in a sample directly, for example, by 

conventional Southern blotting, Northern blotting to quantitate the transcription of mRNA [Thomas, Proc. Natl. 
Acad. Sci. USA 22, 5201-5205 (1980)], dot blotting (DNA analysis), or in situ hybridization, using an 
appropriately labeled probe, based on the sequences provided herein. Various labels may be employed, most 
commonly radioisotopes, particularly 32 P. However, other techniques may also be employed, such as using 
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biotin-modified nucleotides for introduction into a polynucleotide. The biotin then serves as a site for binding 
to avidin or antibodies, which may be labeled with a wide variety of labels, such as radionuclides, fluoresces, 
enzymes, or the like. Alternatively, antibodies may be employed that can recognize specific duplexes, including 
DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. The antibodies in 
5 turn may be labeled and the assay may be carried out where the duplex is bound to the surface, so that upon the 
formation of duplex on the surface, the presence of antibody bound to the duplex can be detected. 

Gene expression, alternatively, may be measured by immunological methods, such as 
immunohistochemical staining of tissue sections and assay of cell culture or body fluids, to quantitate directly 
the expression of gene product. A particularly sensitive staining technique suitable for use in the present 
10 invention is described by Hse et aL 9 Am. J. Clin. Pharm 75. 734-738 (1980). 

Antibodies useful for immunohistochemical staining and/or assay of sample fluids may be either 
monoclonal or polyclonal, and may be prepared in any animal. Conveniently, the antibodies may be prepared 
against a native type C lectin polypeptide, or against a synthetic peptide based on the DNA sequence provided 
herein as described further hereinbelow. 
1 5 E. Amino Acid Sequence Variants of a native type C lectins 

Amino acid sequence variants of native type C lectins are prepared by methods known in the art by 
introducing appropriate nucleotide changes into a native type C lectin DNA, or by in vitro synthesis of the 
desired polypeptide. There are two principal variables in the construction of amino acid sequence variants: the 
location of the mutation site and the nature of the mutation. With the exception of naturally-occurring alleles, 
20 which do not require the manipulation of the DNA sequence encoding the native type C lectin, the amino acid 
sequence variants of type C lectins are preferably constructed by mutating the DNA, either to arrive at an allele 
or an amino acid sequence variant that does not occur in nature. 

One group of mutations will be created within the fibronectin type II domain or within one or more of 
the type C lectin domains (preferably within the lectin-like domains 1-3) of a novel native type C lectin of the 
25 present invention. These domains are believed to be functionally important, therefore, alterations, such as non- 
conservative substitutions, insertions and/or deletions in these regions are expected to result in genuine changes 
in the properties of the native receptor molecules. The tyrosine residue at position 145 1 of the novel murine and 
human type C lectins and the surrounding amino acids are also believed to have a functional significance, since 
this tyrosine is conserved in type C lectins, and has been previously found to be important for the endocytosis 
30 of the phospholipase A2 receptor. Accordingly, amino acid alterations in this region are also believed to result 
in variants with properties significantly different from the corresponding native polypeptides. Non-conservative 
substitutions within these functionally important domains may result in variants which loose the carbohydrate 
recognition and binding ability of their native counterparts, or have increased carbohydrate recognition properties 
or enhanced selectivity as compared to the corresponding native proteins. 
35 Alternatively or in addition, amino acid alterations can be made at sites that differ in novel type C lectins 

from various species, or in highly conserved regions, depending on the goal to be achieved. Sites at such 
locations will typically be modified in series, e.g. by ( 1 ) substituting first with conservative choices and then with 
more radical selections depending upon the results achieved, (2) deleting the target residue or residues, or (3) 
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inserting residues of the same or different class adjacent to the located site, or combinations of options 1-3. One 
helpful technique is called "alanine scanning" (Cunningham and Wells, Scjejice. 244, 1081-1085 [1989]). 

In yet another group of the variant type C lectins of the present invention, one or more of the 
functionally less significant domains may be deleted or inactivated. For example, the deletion or inactivation 
5 of the transmembrane domain yields soluble variants of the native proteins. Alternatively, or in addition, the 
cytoplasmic domain may be deleted, truncated or otherwise altered. 

Naturally-occurring amino acids are divided into groups based on common side chain properties: 

(1) hydrophobic: norleucine, met, ala, val, leu, ile; 

(2) neutral hydrophobic: cys, ser, thr; 
10 (3) acidic: asp, glu; 

(4) basic: asn, gin, his, lys, arg; 

(5) residues that influence chain orientation: gly, pro; and 

(6) aromatic: trp, tyr, phe. 

Conservative substitutions involve exchanging a member within one group for another member within 
15 the same group, whereas non-conservative substitutions will entail exchanging a member of one of these classes 
for another. Substantial changes in function or immunological identity are made by selectin substitutions that 
are less conservative, i.e. differ more significantly in their effect on maintaining (a) the structure of the 
polypeptide backbone in the area of substitution, for example as a sheet or helical conformation, (b) the charge 
or hydrophobicity of the molecule at the target site or (c) the bulk of the side chain. The substitutions which in 
20 general are expected to produce the greatest changes in the properties of the novel native type C lectins of the 
present invention will be those in which (a) a hydrophilic residue, e.g. seryl or threonyi, is substituted for (or by) 
a hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine or proline is substituted 
for (or by) any other residue; (c) a residue having an electropositive side chain, e.g. lysyl, arginyl, or histidyl, 
is substituted for (or by) an electronegative residue, e.g., glutamyl or aspartyl; or (d) a residue having a bulky 
25 side chain, e.g., phenylalanine, is substituted for (or by) one not having a side chain, e.g. glycine. 

Substitutional variants of the novel type C lectins of the present invention also include variants where 
functionally homologous (having at least about 40%-50% homology) domains of other protens are substituted 
by routine methods for one or more of the above-identified domains within the novel type C lectin structure. For 
example, the cysteine-rich domain, the fibronectin type II domain, or one or more of the first three carbohydrate 
30 recognition (CDR) domain of a novel type C lectin of the present invention can be replaced by a corresponding 
domain of a macrophage mannose receptor, a phospholipase A2 receptor or a DEC 205 receptor. 

Amino acid sequence deletions generally range from about 1 to 30 residues, more preferably about 1 
to 10 residues, and typically are contiguous. Typically, the transmembrane and cytoplasmic domains, or only 
the cytoplasmic domains are deleted. However, deletion from the C-terminal to any other suitable N-terminai 
35 to the transmembrane region which preserves the biological activity or immunological cross-reactivity of a native 
type C lectin is suitable. 

A preferred class of substitutional and/or deletional variants of the present invention are those involving 
a transmembrane region of a novel type C lectin molecule. Transmembrane regions are highly hydrophobic or 
lipophilic domains that are the proper size to span the lipid biiayer of the cellular membrane. They are believed 



-16- 



WO 97/40154 PCT/US97/06347 

to anchor the lectin in the cell membrane, and allow for homo- or heteropolymeric complex formation, 
lnactivation of the transmembrane domain, typically by deletion or substitution of transmembrane domain 
hydroxylation residues, will facilitate recovery and formulation by reducing its cellular or membrane lipid affinity 
and improving its aqueous solubility. It the transmembrane and cytoplasmic domains are deleted one avoids the 
5 introduction of potentially immunogenic epitops, wither by exposure of otherwise intracellular polypeptides that 
might be recognized by the body as foreign or by insertion of heterologous polypeptides that are potentially 
immunogenic. lnactivation of the membrane binding function is accomplished by deletion of sufficient residues 
to produce a substantially hydrophilic hydropathy profile at this site or by substituting with heterologous residues 
which accomplish the same result. 

10 A principle advantage of the transmembrane inactivated variants of the type C lectins of the present 

invention is that they may be secreted into the culture medium of recombinant hosts. These variants are soluble 
in body fluids such as blood and do not have an appreciable affinity for cell membrane lipids, thus considerably 
simplifying their recovery from recombinant cell culture. As a general proposition, such soluble variants will 
not have a functional transmembrane domain and preferably will not have a functional cytoplasmic domain. For 

15 example, the transmembrane domain may be substituted by any amino acid sequence, e.g. a random or 
predetermined sequences of about 5 to 50 serine, threonine, lysine, arginine, glutamine, aspartic acid and like 
hydrophilic residues, which altogether exhibit a hydrophilic hydropathy profile. Like the deletional (truncated) 
soluble variants, these variants are secreted into the culture medium of recombinant hosts. 

Amino acid insertions include amino- and/or carboxyl-terminal fusions ranging in length from one 

20 residue to polypeptides containing a hundred or more residues, as well as intrasequence insertions of single or 
multiple amino acid residues. Intrasequence insertions (i.e. insertions within the novel type C lectin amino acid 
sequence) may range generally from about 1 to 10 residues, more preferably 1 to 5 residues, more preferably 1 
to 3 residues. Examples of terminal insertions include the type C lectins with an N-terminal methionyl residue, 
an artifact of its direct expression in bacterial recombinant cell culture, and fusion of a heterologous N-terminal 

25 signal sequence to the N-terminus of the type C lectin molecule to facilitate the secretion of the mature type C 
lectin from recombinant host cells. Such signal sequences will generally be obtained from, and thus homologous 
to, the intended host cell species. Suitable sequences include STII or Ipp for E. coli . alpha factor for yeast, and 
viral signals such as herpes gD for mammalian cells. 

Other insertional variants of the native type C lectin molecules include the fusion of the N- or C- 

30 terminus of the type C lectin molecule to immunogenic polypeptides, e.g. bacterial polypeptides such as beta- 
lactamase or an enzyme encoded by the E. coli trp locus, or yeast protein, and C-terminal fusions with proteins 
having a long half-life such as immunoglobulin regions (preferably immunoglobulin constant regions), albumin, 
or ferritin, as described in WO 89/02922 published on 6 April 1989. 

Further insertional variants are immunologically active derivatives of the novel type C lectines, which 

35 comprise the lectin and a polypeptide containing an epitope of an immunologically competent extraneous 
polypeptide, i.e. a polypeptide which is capable of eliciting an immune response in the animal to which the fusion 
is to be administered or which is capable of being bound by an antibody raised against an extraneous 
polypeptide. Typical examples of such immunologically competent polypeptides are allergens, autoimmune 
epitopes, or other potent immunogens or antigens recognized by pre-existing antibodies in the fusion recipient, 
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including bacterial polypeptides such as trpLE, P-giactosidase, viral polypeptides such as herpes gD protein, and 
the like. 

Immunogenic fusions are produced by cross-linking in vitro or by recombinant cell culture transformed 
with DNA encoding an immunogenic polypeptide. It is preferable that the immunogenic fusion be one in which 

5 the immunogenic sequence is joined to or inserted into novel type C lectin molecule or fragment thereof by (a) 
peptide bond(s). These products therefore consist of a linear polypeptide chain containing the type C lectin 
epitope and at least one epitope foreign to the type C lectin. It will be understood that it is within the scope of 
this invention to introduce the epitopes anywhere within a type C lectin molecule of the present invention or a 
fragment thereof. These immunogenic insertions are particularly useful when formulated into a 

10 pharmacologically acceptable carrier and administered to a subject in order to raise antibodies against the type 
C lectin molecule, which antibodies in turn are useful as diagnostics, in tissue-typing, or in purification of the 
novel type C lectins by immunoaffinity techniques known per se. Alternatively, in the purification of the type 
C lectins of the present invention, binding partners for the fused extraneous polypeptide, e.g. antibodies, 
receptors or ligands, are used to adsorb the fusion from impure admixtures, after which the fusion is eluted and, 

15 if desired, the novel type C lectin is recovered from the fusion, e.g. by enzymatic cleavage. 

Since it is often difficult to predict in advance the characteristics of a variant type C lectin, it will be 
appreciated that some screening will be needed to select the optimum variant. 

After identifying the desired mutation(s), the gene encoding a type C lectin variant can, for example, 
be obtained by chemical synthesis as hereinabove described. More preferably, DNA encoding a type C lectin 

20 amino acid sequence variant is prepared by site-directed mutagenesis of DNA that encodes an earlier prepared 
variant or a non variant version of the type C lectin. Site-directed (site-specific) mutagenesis allows the 
production of type C lectin variants through the use of specific oligonucleotide sequences that encode the DNA 
sequence of the desired mutation, as well as a sufficient number of adjacent nucleotides, to provide a primer 
sequence of sufficient size and sequence complexity to form a stable duplex on both sides of the deletion junction 

25 being traversed. Typically, a primer of about 20 to 25 nucleotides in length is preferred, with about 5 to 10 
residues on both sides of the junction of the sequence being altered. In general, the techniques of site-specific 
mutagenesis are well known in the art, as exemplified by publications such as, Edelman et ai y DNA 2, 183 
(1983). As will be appreciated, the site-specific mutagenesis technique typically employs a phage vector that 
exists in both a single-stranded and double-stranded form. Typical vectors useful in site-directed mutagenesis 

3 0 include vectors such as the M 1 3 phage, for example, as disclosed by Messing et ai , Third Cleveland Symposium 
on Macromolecules and Recombinant DNA . A. Walton, ed, Elsevier, Amsterdam (1981). This and other phage 
vectors are commercially available and their use is well known to those skilled in the art. A versatile and 
efficient procedure for the construction of oligodeoxyribonucleotide directed site-specific mutations in DNA 
fragments using M13-derived vectors was published by Zoller, M.J. and Smith, M., N\JCfoc Apids Res, I£, 6487- 

35 6500 [1982]). Also, plasmid vectors that contain a single-stranded phage origin of replication (Veira et al y 
Meth. Enzvmol. 153 . 3 [1987]) may be employed to obtain single-stranded DNA. Alternatively, nucleotide 
substitutions are introduced by synthesizing the appropriate DNA fragment in vitro, and amplifying it by PCR 
procedures known in the art. 
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The PCR technique may also be used in creating amino acid sequence variants of a novel type C lectin. 
In a specific example of PCR mutagenesis, template plasmid DNA (1 ug) is linearized by digestion with a 
restriction endonuclease that has a unique recognition site in the plasmid DNA outside of the region to be 
amplified. Of this material, 100 ng is added to a PCR mixture containing PCR buffer, which contains the four 
5 deoxynucleotide triphosphates and is included in the GeneAmp^ kits (obtained from Perkin-Elmer Cetus, 
Norwalk, CT and Emeryville, CA), and 25 pmole of each oligonucleotide primer, to a final volume of 50 ul. 
The reaction mixture is overlayered with 35 ul mineral oil. The reaction is denatured for 5 minutes at 100°C, 
placed briefly on ice, and then 1 ul Therm us aquaticus (lag) DNA polymerase (5 units/ 1), purchased from 
Perkin-Elmer Cetus, Norwalk, CT and Emeryville, CA) is added below the mineral oil layer. The reaction 
1 0 mixture is then inserted into a DNA Thermal Cycler (purchased from Perkin-Elmer Cetus) programmed as 
follows: 

2 min. 55°C, 

30 sec. 72oC, then 19 cycles of the following: 
30 sec. 94©C, 
15 30 sec. 55©C, and 

30 sec. 72oC. 

At the end of the program, the reaction vial is removed from the thermal cycler and the aqueous phase 
transferred to a new vial, extracted with phenol/chloroform (50:50 vol), and ethanol precipitated, and the DNA 
is recovered by standard procedures. This material is subsequently subjected to appropriate treatments for 

20 insertion into a vector. 

Another method for preparing variants, cassette mutagenesis, is based on the technique described by 
Wells et al f Gene H, 315 (1985)]. 

Additionally, the so-called phagemid display method may be useful in making amino acid sequence 
variants of native or variant type C lectins or their fragments. This method involves (a) constructing a replicable 

25 expression vector comprising a first gene encoding an receptor to be mutated, a second gene encoding at least 
a portion of a natural or wild-type phage coat protein wherein the first and second genes are heterologous, and 
a transcription regulatory element operably linked to the first and second genes, thereby forming a gene fusion 
encoding a fusion protein; (b) mutating the vector at one or more selected positions within the first gene thereby 
forming a family of related plasmids; (c) transforming suitable host cells with the plasmids; (d) infecting the 

30 transformed host cells with a helper phage having a gene encoding the phage coat protein; (e) culturing the 
transformed infected host cells under conditions suitable for forming recombinant phagemid particles containing 
at least a portion of the plasmid and capable of transforming the host, the conditions adjusted so that no more 
than a minor amount of phagemid particles display more than one copy of the fusion protein on the surface of 
the particle; (f) contacting the phagemid particles with a suitable antigen so that at least a portion of the phagemid 

35 particles bind to the antigen; and (g) separating the phagemid particles that bind from those that do not. Steps 
(d) through (g) can be repeated one or more times. Preferably in this method the plasmid is under tight control 
of the transcription regulatory element, and the culturing conditions are adjusted so that the amount or number 
of phagemid particles displaying more than one copy of the fusion protein on the surface of the particle is less 
than about 1%. Also, preferably, the amount of phagemid particles displaying more than one copy of the fusion 
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protein is less than 1 0% of the amount of phagemid particles displaying a single copy of the fusion protein. Most 
preferably, the amount is less than 20%. Typically in this method, the expression vector will further contain a 
secretory signal sequence fused to the DNA encoding each subunit of the polypeptide and the transcription 
regulatory element will be a promoter system. Preferred promoter systems are selected from lac. Z, Xp^, lac, T7 
5 polymerase, tryptophan, and alkaline phosphatase promoters and combinations thereof. Also, normally the 
method will employ a helper phage selected from M13K07, M13R408, M13-VCS, and Phi X 174. The preferred 
helper phage is M13K07, and the preferred coat protein is the M13 Phage gene III coat protein. The preferred 
host is E. coli, and protease-deficient strains of £. coli. 

Further details of the foregoing and similar mutagenesis techniques are found in general textbooks, such 
1 0 as, for example, Sambrook et al , supra, and Current Protocols in Molecular Biolog y. Ausubel et ai eds., supra. 

F. Glycosylation variants 

Glycosylation variants are included within the scope of the present invention. They include variants 
completely lacking in glycosylation (unglycosylated), variants having at least one less glycosylated site than the 
native form (deglycosylated) as well as variants in which the gycosylation has been changed. Included are 

1 5 deglycosylated and unglycosylated amino acid sequences variants, deglycosylated and unglycosylated native type 
C lectins, and other glycosylation variants. For example, substitutional or deletional mutagenesis may be 
employed to eliminate the N- or O-linked glycosylation sites in the a native or variant type C lectin of the present 
invention, e.g. the asparagine residue may be deleted or substituted for another basic residue such as lysine or 
histidine. Alternatively, flanking residues making up the glycosylation site may be substituted or deleted, 

20 eventhough the asparagine residues remain unchanged, in order to prevent glycosylation by eliminating the 
glycosylation recognition site. 

Additionally, unglycosylated type C lectins which have the glycosylation sites of a native molecule may 
be produced in recombinant prokaryotic cell culture because prokaryotes are incapable of introducing 
glycosylation into polypeptides. 

25 Glycosylation variants may be produced by selecting appropriate host cells or by in vitro methods. 

Yeast and insect cells, for example, introduce glycosylation which varies significantly from that of mammalian 
systems. Similarly, mammalian cells having a different species (e.g. hamster, murine, porcine, bovine or ovine), 
or tissue origin (e.g. lung, liver, lymphoid, mesenchymal or epidermal) than the source of the type C lectin are 
routinely screened for the ability to introduce variant glycosylation as characterized for example by elevated 

30 levels of mannose or variant ratios of mannose, fucose, sialic acid, and other sugars typically found in 
mammalian glycoproteins. In vitro processing of the type C lectin typically is accomplished by enzymatic 
hydrolysis, e.g. neuraminidate digestion. 

G. Covalent Modifications 

Covalent modifications of the novel type C lectins of the present invention are included within the scope 
35 herein. Such modifications are traditionally introduced by reacting targeted amino acid residues of the type C 
lectins with an organic derivatizing agent that is capable of reacting with selected sides or terminal residues, or 
by harnessing mechanisms of post-translational modifications that function in selected recombinant host cells. 
The resultant covalent derivatives are useful in programs directed at identifying residues important for biological 
activity, for immunoassays of the type C lectin, or for the preparation of anti-type C lectin antibodies for 
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immunoaffinity purification of the recombinant. For example, complete inactivation of the biological activity 
of the protein after reaction with ninhydrin would suggest that at least one arginyl or lysyl residue is critical for 
its activity, whereafter the individual residues which were modified under the conditions selected are identified 
by isolation of a peptide fragment containing the modified amino acid residue. Such modifications are within 
5 the ordinary skill in the art and are performed without undue experimentation. 

Derivatization with bifunctional agents is useful for preparing intramolecular aggregates of the type C 
lectins with polypeptides as well as for cross-iinking the type C lectin polypeptide to a water insoluble support 
matrix or surface for use in assays or affinity purification. In addition, a study of interchain cross-links will 
provide direct information on conformational structure. Commonly used cross-linking agents include 1,1- 

10 bis(diazoacetyl)-2-phenylethane, glutaraldehyde, N-hydroxysuccinimide esters, homobifunctionai imidoesters, 
and bifunctional maleimides. Derivatizing agents such as methyI-3-[(p-azidophenyl)dithio]propioimidate yield 
photoactivatable intermediates which are capable of forming cross-links in the presence of light. Alternatively, 
reactive water insoluble matrices such as cyanogen bromide activated carbohydrates and the systems reactive 
substrates described in U.S. Patent Nos. 3,959,642; 3,969,287; 3,691,016; 4,195,128; 4,247,642; 4,229,537; 

1 5 4,055,635; and 4,330,440 are employed for protein immobilization and cross-linking. 

Certain post-translational modifications are the result of the action of recombinant host cells on the 
expressed polypeptide. Glutaminyl and aspariginyl residues are frequently post-translationally deamidated to 
the corresponding glutamyl and aspartyl residues. Alternatively, these residues are deamidated under mildly 
acidic conditions. Either form of these residues falls within the scope of this invention. 

20 Other post-translational modifications include hydroxylation of proline and lysine, phosphorylation of 

hydroxyl groups of seryl, threonyl or tyrosyl residues, methylation of the a-amino groups of lysine, arginine, and 
histidine side chains [T.E. Creighton, Proteins: Structure and Molecular Properties . W.H. Freeman & Co., San 
Francisco, pp. 79-86 (1983)]. 

Further derivatives of the type C lectins herein are the so called "immunoadhesins", which are chimeric 

25 antibody-like molecules combining the functional domain(s) of a binding protein (usually a receptor, a cell- 
adhesion molecule or a ligand) with the an immunoglobulin sequence. The most common example of this type 
of fusion protein combines the hinge and Fc regions of an immunoglobulin (Ig) with domains of a cell-surface 
receptor that recognizes a specific ligand. This type of molecule is called an "immunoadhesin", because it 
combines "immune" and "adhesion" functions; other frequently used names are "Ig-chimera", "Ig-" or "Fc-fusion 

30 protein", or "receptor-globulin." 

To date, more than fifty immunoadhesins have been reported in the art. Immunoadhesins reported in 
the literature include, for example, fusions of the T cell receptor (Gascoigne et ai y Proc. Natl. Acad. Sci. USA 
34, 2936-2940 [1987]); CD4 (Capon et ai % Nature 222, 525-53 1 [1989]; Traunecker et ai , Nature 339 . 68-70 
[1989]; Zettmeissl et ai y DNA Cell Biol. USA 2, 347-353 [1990]; Byrn et ai, Nature 344 . 667-670 [ 1 990]); L- 

35 selectin (homing receptor) (Watson etai. J. Cell. Biol. 110. 2221-2229 [1990]; Watson et ai. Nature 349 . 164- 
1 67 [ 1 99 1 ]); E-selectin [Mulligan et ai , J. Immunol. 151 . 64 1 0- 1 7 [ 1 993 ] ; Jacob et ai , Biochemistry 24 , 1 2 1 0- 
1217 [1995]); P-selectin (Mulligan et ai, supra : Hollenbaugh et ai. Biochemistry 24, 5678-84 [1995]); ICAM-1 
(Stauton et ai.. J. Exp. Med. 176. 1471-1476 [1992]; Martin et ai, J. Virol. £Z, 3561-68 [1993]; Roep et ai, 
Lancet 242, 1 590-93 [ 1 994]); IC AM-2 (Damle et ai , J. Immunol. 148 . 665-7 1 [ 1 992]); IC AM-3 (Holness et ai , 
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J. Biol. Chem. 270 » 877-84 [1995]); LFA-3 (Kanner et ai, J. Immunol. 148. 2-23-29 [1992]); LI glycoprotein 
(Doherty e/a/.,MejyasaM, 57-66 [1995]); TNF-R1 rAghVenn^ rial Prnc Natl Acad. Sci. USA 88. 10535-539 
[1991]; Lesslauer et al, Kur. J. Immunol. 21, 2883-86 [1991]; Peppel et aL, J. Exp. Med. 121, 1483-1489 
[1991]); TNF-R2 (Zack et cL % Proc. Natl. Acad. Sci. USA 2Q> 2335-39 [1993]; Wooley et aL, J, Immunol, ILL 
5 6602-07 [1993]); CD44 [ArufFo etaL, £eJl£L 1303-1313 (1990)]; CD28 and B7 [Linsley et al.J. Ex p f Me d 
173 . 721-730 (1991)]; CTLA-4 [Lisley et al.. J. Exp. Med. 174. 561-569 (1991)]; CD22 [Stamenkovic etal. y 
Cell 66. 1133-1144 (1991)]; NP receptors [Bennett et aL 3 J. Biol. Chem. 2£6, 23060-23067 (1991)]; IgE 
receptor a [Ridgway and Gorman. J. Cell. Biol. 115. abstr. 1448 (1991)]; HGF receptor [Mark, M.R. et al, 
1992, J. Biol. Chem. submitted]; IFN-yR a- and p-chain [Marsters et al 9 Proc. Natl. Acad. Sci. USA 22, 5401- 

10 05 [1QQS]); trk-A, -B r and -C fShelton et aL, J. Neurosci. 15. 477-91 [1995]); IL-2 (Landolfi. J. Immunol. 146. 
9 1 5- 1 9 [ 1 99 1 ]); IL- 1 0 (Zheng et al , J. Immunol. JM, 5 590-5600 [ 1 995]). 

The simplest and most straightforward immunoadhesin design combines the binding region(s) of the 
'adhesin' protein with the hinge and Fc regions of an immunoglobulin heavy chain. Ordinarily, when preparing 
the lectin- immunoglobulin chimeras of the present invention, nucleic acid encoding the desired type C lectin 

1 5 polypeptide will be fused C-terminally to nucleic acid encoding the N-terminus of an immunoglobulin constant 
domain sequence, however N -terminal fusions are also possible. Typically, in such fusions the encoded chimeric 
polypeptide will retain at least functionally active hinge, CH2 and CH3 domains of the constant region of an 
immunoglobulin heavy chain. Fusions are also made to the C-terminus of the Fc portion of a constant domain, 
or immediately N-terminal to the CHI of the heavy chain or the corresponding region of the light chain. The 

20 precise site at which the fusion is made is not critical; particular sites are well known and may be selected in 
order to optimize the biological activity, secretion or binding characteristics of the lectin-immunoglobulin 
chimeras. 

In a preferred embodiment, the sequence of a native, mature lectin polypeptide, or a soluble 
(transmembrane domain- inactivated) form thereof, is fused to the N-terminus of the C-terminal portion of an 

25 antibody (in particular the Fc domain), containing the effector functions of an immunoglobulin, e.g. IgG-1 . It 
is possible to fuse the entire heavy chain constant region to the lectin sequence. However, more preferably, a 
sequence beginning in the hinge region just upstream of the papain cleavage site (which defines IgG Fc 
chemically; residue 216, taking the first residue of heavy chain constant region to be 1 14 [Kobet et ai, supra], 
or analogous sites of other immunoglobulins) is used in the fusion. In a particularly preferred embodiment, the 

30 type C lectin sequence (full length or soluble) is fused to the hinge region and CH2 and CH3 or CH 1 , hinge, CH2 
and CH3 domains of an IgG-1, IgG-2, or IgG-3 heavy chain. The precise site at which the fusion is made is not 
critical, and the optimal site can be determined by routine experimentation. 

In some embodiments, the lectin-immunoglobulin chimeras are assembled as multimers, and particularly 
as homo-dimers or -tetramers (WO 91/08298). Generally, these assembled immunoglobulins will have known 

35 unit structures. A basic four chain structural unit is the form in which IgG, IgD, and IgE exist. A four unit is 
repeated in the higher molecular weight immunoglobulins; IgM generally exists as a pentamer of basic four units 
held together by disulfide bonds. IgA globulin, and occasionally IgG globulin, may also exist in multimeric form 
in serum. In the case of multimer, each four unit may be the same or different. 
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Various exemplary assembled lectin-immunoglobulin chimeras within the scope herein are 
schematically diagrammed below: 

(a) AC L -AC L ; 

(b) AC H -[AC H , AC L -AC H , AC L -V H C H> or V L C L -AC H ]; 

5 (c) AC L -AC H -[AC L -AC H , AC L -V H C H , V L C L -AC H> or V L C L -V H C H ]; 
(d) AC L -V H C H -[AC H , or AC L -V H C H , or V L C L -AC H ]; 

(«) V L C L- AC H-f AC L- V H C H' or V L C L- AC h1: and 
(0 [A-Y] n -[V L C L -V H C H ] 2 , 

wherein 

1 0 each A represents identical or different novel type C lectin polypeptide amino acid sequences; 

is an immunoglobulin light chain variable domain; 
V H is an immunoglobulin heavy chain variable domain; 

is an immunoglobulin light chain constant domain; 
C H is an immunoglobulin heavy chain constant domain; 
15 n is an integer greater than 1 ; 

Y designates the residue of a covalent cross- linking agent. 

In the interests of brevity, the foregoing structures only show key features; they do not indicate joining 
(J) or other domains of the immunoglobulins, nor are disulfide bonds shown. However, where such domains are 
required for binding activity, they shall be constructed as being present in the ordinary locations which they 

20 occupy in the immunoglobulin molecules. 

Alternatively, the type C lectin amino acid sequences can be inserted between immunoglobulin heavy 
chain and light chain sequences such that an immunoglobulin comprising a chimeric heavy chain is obtained. 
In this embodiment, the type C lectin polypeptide sequences are fused to the 3 ! end of an immunoglobulin heavy 
chain in each arm of an immunoglobulin, either between the hinge and the CH2 domain, or between the CH2 and 

25 CH3 domains. Similar constructs have been reported by Hoogenboom, H. R. et aL, Mol. Immunol. 22, 1027- 
1037(1991). 

Although the presence of an immunoglobulin light chain is not required in the immunoadhesins of the 
present invention, an immunoglobulin light chain might be present either covalently associated to an type C 
lectin-immunoglobulin heavy chain fusion polypeptide, or directly fused to the type C lectin polypeptide. In 

30 the former case, DNA encoding an immunoglobulin light chain is typically coexpressed with the DNA encoding 
the type C lectm-immunoglobulin heavy chain fusion protein. Upon secretion, the hybrid heavy chain and the 
light chain will be covalently associated to provide an immunoglobulin-like structure comprising two disulfide- 
linked immunoglobulin heavy chain-light chain pairs. Method suitable for the preparation of such structures are, 
for example, disclosed in U.S. Patent No. 4,816,567 issued 28 March 1989. 

35 In a preferred embodiment, the immunoglobulin sequences used in the construction of the 

immunoadhesins of the present invention are from an IgG immunoglobulin heavy chain constant domain. For 
human immunoadhesins, the use of human IgG-1 and IgG-3 immunoglobulin sequences is preferred. A major 
advantage of using IgG-1 is that IgG-1 immunoadhesins can be purified efficiently on immobilized protein A. 
In contrast, purification of IgG-3 requires protein G, a significantly less versatile medium. However, other 
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structural and functional properties of immunoglobulins should be considered when choosing the Ig fusion 
partner for a particular immunoadhesin construction. For example, the IgG-3 hinge is longer and more flexible, 
so it can accommodate larger 'adhesin* domains that may not fold or function properly when fused to IgG-1. 
While IgG immunoadhesins are typically mono- or bivalent, other Ig subtypes like IgA and IgM may give rise 
5 to dimeric or pentameric structures, respectively, of the basic Ig homodimer unit. Multimeric immunoadhesins 
are advantageous in that they can bind their respective targets with greater avidity than their IgG-based 
counterparts. Reported examples of such structures are CD4-IgM (Traunecker et al, supra); ICAM-IgM (Martin 
et al. J. Virol. 67. 3561-68 [1993]); and CD2-IgM (Arulanandam etal A. Exp. Med. 177. 1439-50 [1993]). 

For type C lectin-Ig immunoadhesins, which are designed for in vivo application, the pharmacokinetic 
10 properties and the effector functions specified by the Fc region are important as well. Although IgG-1, IgG-2 
and IgG-4 all have in vivo half-lives of 21 days, their relative potencies at activating the complement system are 
different. IgG-4 does not activate complement, and IgG-2 is significantly weaker at complement activation than 
IgG-1 . Moreover, unlike IgG-1, IgG-2 does not bind to Fc receptors on mononuclear cells or neutrophils. While 
IgG-3 is optimal for complement activation, its in vivo half-life is approximately one third of the other IgG 
1 5 isotypes. Another important consideration for immunoadhesins designed to be used as human therapeutics is 
the number of allotypic variants of the particular isotype. In general, IgG isotypes with fewer serologically- 
defined allotypes are preferred. For example, IgG-1 has only four serologically-defined allotypic sites, two of 
which (Glm and 2) are located in the Fc region; and one of these sites Glml, is non-immunogenic. In contrast, 
there are 12 serologically-defined allotypes in IgG-3, all of which are in the Fc region; only three of these sites 
20 (G3m5, 1 1 and 21) have one allotype which is nonimmunogenic. Thus, the potential immunogenicity of a y3 
immunoadhesin is greater than that of a y 1 immunoadhesin. 

Type C lectin-Ig immunoadhesins are most conveniently constructed by fusing the cDNA sequence 
encoding the type C lectin portion in-frame to an Ig cDNA sequence. However, fusion to genomic Ig fragments 
can also be used (see, e.g. Gascoigne et al , Proc. Natl. Acad. Sci. USA M, 2936-2940 [ 1 987]; Aruffo et al , £eJl 
25 61, 1303-1313 [1990]; Stamenkovic et al , Cell 66, 1 133-1 144 [1991]). The latter type of fusion requires the 
presence of Ig regulatory sequences for expression. cDNAs encoding IgG heavy-chain constant regions can be 
isolated based on published sequence from cDN A libraries derived from spleen or peripheral blood lymphocytes, 
by hybridization or by polymerase chain reaction (PCR) techniques. 

Other derivatives of the novel type C lectins of the present invention, which possess a longer half-life 
30 than the native molecules comprise the lectin or a lectin- immunoglobulin chimera, covalently bonded to a 
nonproteinaceous polymer. The nonproteinaceous polymer ordinarily is a hydrophilic synthetic polymer, i.e., 
a polymer not otherwise found in nature. However, polymers which exist in nature and are produced by 
recombinant or in vitro methods are useful, as are polymers which are isolated from native sources. Hydrophilic 
polyvinyl polymers fall within the scope of this invention, e.g. polyvinylalcohol and polyvinylpyrrolidone. 
35 Particularly useful are polyalkylene ethers such as polyethylene glycol (PEG); polyelkylenes such as 
polyoxyethylene, polyoxypropylene, and block copolymers of polyoxyethylene and polyoxypropylene 
(Pluronics); polymethacrylates; carbomers; branched or unbranched polysaccharides which comprise the 
saccharide monomers D-mannose, D- and L-galactose, fucose, fructose, D-xylose, L-arabinose, D-glucuronic 
acid, sialic acid, D-galacturonic acid, D-mannuronic acid (e.g. polymannuronic acid, or alginic acid), D- 
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glucosamine, D-galactosamine, D-glucose and neuraminic acid including homopolysaccharides and 
heteropolysaccharides such as lactose, amylopectin, starch, hydroxyethyl starch, amylose, dextrane sulfate, 
dextran, dextrins, glycogen, or the polysaccharide subunit of acid mucopolysaccharides, e.g. hyaluronic acid; 
polymers of sugar alcohols such as polysorbitol and polymannitol; heparin or heparon. The polymer prior to 
5 cross-linking need not be, but preferably is, water soluble, but the final conjugate must be water soluble. In 
addition, the polymer should not be highly immunogenic in the conjugate form, nor should it possess viscosity 
that is incompatible with intravenous infusion or injection if it is intended to be administered by such routes. 

Preferably the polymer contains only a single group which is reactive. This helps to avoid cross-linking 
of protein molecules. However, it is within the scope herein to optimize reaction conditions to reduce cross- 
1 0 linking, or to purify the reaction products through gel filtration or chromatographic sieves to recover substantially 
homogenous derivatives. 

The molecular weight of the polymer may desirably range from about 1 00 to 500,000, and preferably 
is from about 1 ,000 to 20,000. The molecular weight chosen will depend upon the nature of the polymer and 
the degree of substitution. In general, the greater the hydrophilicity of the polymer and the greater the degree 

15 of substitution, the lower the molecular weight that can be employed. Optimal molecular weights will be 
determined by routine experimentation. 

The polymer generally is covalently linked to the novel type C lectin or to the lectin- immunoglobulin 
chimeras though a multifunctional crosslinking agent which reacts with the polymer and one or more amino acid 
or sugar residues of the type C lectin or lectin-immunoglobulin chimera to be linked. However, it is within the 

20 scope of the invention to directly crosslink the polymer by reacting a derivatized polymer with the hybrid, or via 
versa. 

The covalent crosslinking site on the type C lectin or lectin-Ig includes the N-terminal amino group and 
epsilon amino groups found on lysine residues, as well as other amino, imino, carboxyl, sulfhydryl, hydroxyl or 
other hydrophilic groups. The polymer may be covalently bonded directly to the hybrid without the use of a 

25 multifunctional (ordinarily Afunctional) crosslinking agent. Covalent binding to amino groups is accomplished 
by known chemistries based upon cyanuric chloride, carbonyl diimidazole, aldehyde reactive groups (PEG 
alkoxide plus diethyl acetal of bromoacetaldehyde; PEG plus DMSO and acetic anhydride, or PEG chloride plus 
the phenoxide of 4-hydroxybenzaldehyde, succinimidyl active esters, activated dithiocarbonate PEG, 2,4,5- 
trichlorophenylcloroformate or P-nitrophenylcloroformate activated PEG.) Carboxyl groups are derivatized by 

30 coupling PEG-amine using carbodiimide. 

Polymers are conjugated to oligosaccharide groups by oxidation using chemicals, e.g. metaperiodate, 
or enzymes, e.g. glucose or galactose oxidase, (either of which produces the aldehyde derivative of the 
carbohydrate), followed by reaction with hydrazide or amino derivatized polymers, in the same fashion as is 
described by Heitzmann et al , P,N,A,S T , 21, 3537-41 (1 974) or Bayer et al , Methods in Enzvmologv £2, 3 1 0 

35 (1979), for the labeling of oligosaccharides with biotin or avidin. Further, other chemical or enzymatic methods 
which have been used heretofore to link oligosaccharides are particularly advantageous because, in general, there 
are fewer substitutions than amino acid sites for derivatization, and the oligosaccharide products thus will be 
more homogenous. The oligosaccharide substituents also are optionally modified by enzyme digestion to remove 
sugars, e.g. by neuraminidase digestion, prior to polymer derivatization. 
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The polymer will bear a group which is directly reactive with an amino acid side chain, or the N- or C- 
terminus of the polypeptide linked, or which is reactive with the multifunctional cross-linking agent. In general, 
polymers bearing such reactive groups are known for the preparation of immobilized proteins. In order to use 
such chemistries here, one should employ a water soluble polymer otherwise derivatized in the same fashion as 
5 insoluble polymers heretofore employed for protein immobilization. Cyanogen bromide activation is a 
particularly useful procedure to employ in crosslinking polysaccharides. 

"Water soluble 11 in reference to the starting polymer means that the polymer or its reactive intermediate 
used for conjugation is sufficiently water soluble to participate in a derivatization reaction. 

"Water soluble" in reference to the polymer conjugate means that the conjugate is soluble in 
10 physiological fluids such as blood. 

The degree of substitution with such a polymer will vary depending upon the number of reactive sites 
on the protein, whether all or a fragment of the protein is used, whether the protein is a fusion with a heterologous 
protein (e.g. a type C lectin-immunoglobulin chimera), the molecular weight, hydrophilicity and other 
characteristics of the polymer, and the particular protein derivatization sites chosen. In general, the conjugate 
15 contains about from 1 to 10 polymer molecules, while any heterologous sequence may be substituted with an 
essentially unlimited number of polymer molecules so long as the desired activity is not significantly adversely 
affected. The optimal degree of cross-linking is easily determined by an experimental matrix in which the time, 
temperature and other reaction conditions are varied to change the degree of substitution, after which the ability 
of the conjugates to function in the desired fashion is determined. 
20 The polymer, e.g. PEG, is cross-linked by a wide variety of methods known per se for the covalent 

modification of proteins with nonproteinaceous polymers such as PEG. Certain of these methods, however, are 
not preferred for the purposes herein. Cyanuronic chloride chemistry leads to many side reactions, including 
protein cross-linking. In addition, it may be particularly likely to lead to inactivation of proteins containing 
sulfhydryl groups. Carbonyl diimidazole chemistry (Beauchamp et al y Anal pipcherrir 121, 25-33 [1983]) 
25 requires high pH (>8.5), which can inactivate proteins. Moreover, since the "activated PEG" intermediate can 
react with water, a very large molar excess of "activated PEG" over protein is required. The high concentrations 
of PEG required for the carbonyl diimidazole chemistry also led to problems in purification, as both gel filtration 
chromatography and hydrophilic interaction chromatography are adversely affected. In addition, the high 
concentrations of "activated PEG" may precipitate protein, a problem that per se has been noted previously 
30 (Davis, U.S. Patent No. 4, 179,337). On the other hand, aldehyde chemistry (Royer, U.S. Patent No. 4,002,53 1 ) 
is more efficient since it requires only a 40-fold molar excess of PEG and a 1-2 hr incubation. However, the 
manganese dioxide suggested by Royer for preparation of the PEG aldehyde is problematic "because of the 
pronounced tendency of PEG to form complexes with metal-based oxidizing agents" (Harris et aL, J. PQlym. 
Sci. Polvm. Chem. Ed. 22, 341-52 [1984]). The use of a Moffatt oxidation, utilizing DMSO and acetic 
35 anhydride, obviates this problem. In addition, the sodium borohydride suggested by Royer must be used at high 
pH and has a significant tendency to reduce disulfide bonds. In contrast, sodium cyanoborohydride, which is 
effective at neutral pH and has very little tendency to reduce disulfide bonds is preferred. 
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The long half-life conjugates of this invention are separated from the unreacted starting materials by 
gel filtration. Heterologous species of the conjugates are purified from one another in the same fashion. The 
polymer also may be water-insoluble, as a hydrophilic gel. 

The novel type C lectins may be entrapped in microcapsules prepared, for example, by coacervation 
5 techniques or by interfacial polymerization, in colloidal drug delivery systems (e.g. liposomes, albumin 
microspheres, microemulsions, nano-particles and nanocapsules), or in macroemulsions. Such techniques are 
disclosed in Remington's Pharmaceutic al Sciences . 16th Edition, Osol, A., Ed. (1980). 

H. Antibody preparation 

(i) Polyclonal antibodies 

10 Polyclonal antibodies to a type C lectin of the present invention generally are raised in animals by 

multiple subcutaneous (sc) or intraperitoneal (ip) injections of the type C lectin and an adjuvant. It may be useful 
to conjugate the lectin or a fragment containing the target amino acid sequence to a protein that is immunogenic 
in the species to be immunized, e.g. keyhole limpet hemocyanin, serum albumin, bovine thyroglobulin, or 
soybean trypsin inhibitor using a Afunctional or derivatizing agent, for example maleimidobenzoyl 

1 5 sulfosuccinimide ester (conjugation through cysteine residues), N-hydroxysuccinimide (through lysine residues), 
glutaraldehyde, succinic anhydride, SOCl 2 , or I^N-ONR, where R and R 1 are different alkyl groups. 

Animals are immunized against the immunogenic conjugates or derivatives by combining 1 mg or 1 ug 
of conjugate (for rabbits or mice, respectively) with 3 volumes of Freud's complete adjuvant and injecting the 
solution intradermally at multiple sites. One month later the animals are boosted with 1/5 to 1/10 the original 

20 amount of conjugate in Freud's complete adjuvant by subcutaneous injection at multiple sites. 7 to 1 4 days later 
the animals are bled and the serum is assayed for anti-type C lectin antibody titer. Animals are boosted until the 
titer plateaus. Preferably, the animal boosted with the conjugate of the same type C lectin, but conjugated to a 
different protein and/or through a different cross-linking reagent. Conjugates also can be made in recombinant 
cell culture as protein fusions. Also, aggregating agents such as alum are used to enhance the immune response. 

25 (ii) Monoclonal antibodies 

Monoclonal antibodies are obtained from a population of substantially homogeneous antibodies, i.e., 
the individual antibodies comprising the population are identical except for possible naturally-occurring 
mutations that may be present in minor amounts. Thus, the modifier "monoclonal" indicates the character of the 
antibody as not being a mixture of discrete antibodies. For example, the anti-type C lectin monoclonal 

30 antibodies of the invention may be made using the hybridoma method first described by Kohler & Milstein, 
Nature 25£:495 (1 975), or may be made by recombinant DNA methods [Cabilly, et a!., U.S. Pat. No. 4,8 16,567]. 

DNA encoding the monoclonal antibodies of the invention is readily isolated and sequenced using 
conventional procedures (e.g., by using oligonucleotide probes that are capable of binding specifically to genes 
encoding the heavy and light chains of murine antibodies). The hybridoma cells of the invention serve as a 

35 preferred source of such DNA. Once isolated, the DNA may be placed into expression vectors, which are then 
transfected into host cells such as simian COS cells, Chinese hamster ovary (CHO) cells, or myeloma cells that 
do not otherwise produce immunoglobulin protein, to obtain the synthesis of monoclonal antibodies in the 
recombinant host cells. The DNA also may be modified, for example, by substituting the coding sequence for 
human heavy and light chain constant domains in place of the homologous murine sequences, Morrison, et ai y 
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Proc. Nat. Acad. Sci. 81. 685 1 (1984), or by covalently joining to the immunoglobulin coding sequence all or 
part of the coding sequence for a non-immunoglobulin polypeptide. In that manner, "chimeric" or "hybrid" 
antibodies are prepared that have the binding specificity of a type C lectin monoclonal antibody herein. 

Typically such non-immunoglobulin polypeptides are substituted for the constant domains of an 
5 antibody of the invention, or they are substituted for the variable domains of one antigen-combining site of an 
antibody of the invention to create a chimeric bivalent antibody comprising one antigen-combining site having 
specificity for a type C lectin and another antigen-combining site having specificity for a different antigen. 

Chimeric or hybrid antibodies also may be prepared in vitro using known methods in synthetic protein 
chemistry, including those involving crosslinking agents. For example, immunotoxins may be constructed using 
10 a disulfide exchange reaction or by forming a thioether bond. Examples of suitable reagents for this purpose 
include iminothiolate and methyl-4-mercaptobutyrimidate. 

For diagnostic applications, the antibodies of the invention typically will be labeled with a detectable 
moiety. The detectable moiety can be any one which is capable of producing, either directly or indirectly, a 
detectable signal. For example, the detectable moiety may be a radioisotope, such as ^H, C, P, S, or I, 
1 5 a fluorescent or chemiluminescent compound, such as fluorescein isothiocyanate, rhodamine, or luciferin; biotin; 
radioactive isotopic labels, such as, e.g., 125 I, 32 P, 14 C, or 3 H, or an enzyme, such as alkaline phosphatase, 
beta-galactosidase or horseradish peroxidase. 

Any method known in the art for separately conjugating the antibody to the detectable moiety may be 
employed, including those methods described by Hunter, et al, Nature 144:945 (1962); David, et al, 
20 Biochemistry 13:1014(1974); Pain *t al. J. Immunol. Meth. 40:219 (1981); and Nygren, J t Higtochem. and 
Cvtochem. 30:407 (1982). 

The antibodies of the present invention may be employed in any known assay method, such as 
competitive binding assays, direct and indirect sandwich assays, and immunoprecipitation assays. Zola, 
Monoclonal An tibodies: A Manual of Techniques, pp.147-158 (CRC Press, Inc., 1987). 
25 (iii) Humanized antibodies 

Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized 
antibody has one or more amino acid residues introduced into it from a source which is non-human. These non- 
human amino acid residues are often referred to as "import" residues, which are typically taken from an "import" 
variable domain. Humanization can be essentially performed following the method of Winter and co-workers 
30 [Jones et al y Nature 221, 522-525 (1986); Riechmann et al y Nature 222, 323-327 (1988); Verhoeyen et al, 
Science 239 . 1534-1536 (1988)], by substituting rodent CDRs or CDR sequences for the corresponding 
sequences of a human antibody. Accordingly, such "humanized" antibodies are chimeric antibodies (Cabilly, 
supra! wherein substantially less than an intact human variable domain has been substituted by the corresponding 
sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which 
35 some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent 
antibodies. 

It is important that antibodies be humanized with retention of high affinity for the antigen and other 
favorable biological properties. To achieve this goal, according to a preferred method, humanized antibodies 
are prepared by a process of analysis of the parental sequences and various conceptual humanized products using 
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three dimensional models of the parental and humanized sequences. Three dimensional immunoglobulin models 
are commonly available and are familiar to those skilled in the art. Computer programs are available which 
illustrate and display probable three-dimensional conformational structures of selected candidate immunoglobulin 
sequences. Inspection of these displays permits analysis of the likely role of the residues in the functioning of 
5 the candidate immunoglobulin sequence, i.e. the analysis of residues that influence the ability of the candidate 
immunoglobulin to bind its antigen. In this way, FR residues can be selected and combined from the consensus 
and import sequence so that the desired antibody characteristic, such as increased affinity for the target 
antigen(s), is achieved. In general, the CDR residues are directly and most substantially involved in influencing 
antigen binding. For further details see PCT Pub. WO 94/04679 published 03 March 1994, which is a 
10 continuation-in-part of PCT Pub. WO 92/22653 published 23 December 1992. 

Alternatively, it is now possible to produce transgenic animals (e.g. mice) that are capable, upon 
immunization, of producing a full repertoire of human antibodies in the absence of endogenous immunoglobulin 
production. For example, it has been described that the homozygous deletion of the antibody heavy chain joining 
region (J^ gene in chimeric and germ-line mutant mice results in complete inhibition of endogenous antibody 
1 5 production. Transfer of the human germ-line immunoglobulin gene array in such germ-line mutant mice will 
result in the production of human antibodies upon antigen challenge. See, e.g. Jakobovits et al y Proc. Natl. 
Acad. Sci. USA 90. 2551-255 (1993); Jakobovits et al. Nature 362 . 255-258 (1993). 
(iv) Bispecific antibodies 

Bispecific antibodies are monoclonal, preferably human or humanized, antibodies that have binding 
20 specificities for at least two different antigens. In the present case, one of the binding specificities is for a type 
C lectin of the present invention the other one is for any other antigen, for example, another member of the 
endocytic type C lectin family, or a selectin, such as, E-, L- or P-selectin. Such constructs can also be referred 
to as bispecific immunoadhesins. Methods for making bispecific antibodies (and bispecific immunoadhesins) 
are known in the art. 

25 Traditionally, the recombinant production of bispecific antibodies is based on the coexpression of two 

immunoglobulin heavy chain-light chain pairs, where the two heavy chains have different specificities (Millstein 
and Cuello, Nature ^05, 537-539 (1983)). Because of the random assortment of immunoglobulin heavy and light 
chains, these hybridomas (quadromas) produce a potential mixture of 1 0 different antibody molecules, of which 
only one has the correct bispecific structure. The purification of the correct molecule, which is usually done by 

30 affinity chromatography steps, is rather cumbersome, and the product yields are low. Similar procedures are 
disclosed in PCT application publication No. WO 93/08829 (published 13 May 1993), and in Traunecker et al., 
EMBO 10. 3655-3659 (1991). 

According to a different and more preferred approach, antibody variable domains with the desired 
binding specificities (antibody-antigen combining sites) are fused to immunoglobulin constant domain sequences. 

35 The fusion preferably is with an immunoglobulin heavy chain constant domain, comprising at least part of the 
hinge, and second and third constant regions of an immunoglobulin heavy chain (CH2 and CH3). It is preferred 
to have the first heavy chain constant region (CHI) containing the site necessary for light chain binding, present 
in at least one of the fusions. DNAs encoding the immunoglobulin heavy chain fusions and, if desired, the 
immunoglobulin light chain, are inserted into separate expression vectors, and are cotransfected into a suitable 
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host organism. This provides for great flexibility in adjusting the mutual proportions of the three polypeptide 
fragments in embodiments when unequal ratios of the three polypeptide chains used in the construction provide 
the optimum yields. It is, however, possible to insert the coding sequences for two or all three polypeptide chains 
in one expression vector when the expression of at least two polypeptide chains in equal ratios results in high 
5 yields or when the ratios are of no particular significance. In a preferred embodiment of this approach, the 
bispecific antibodies are composed of a hybrid immunoglobulin heavy chain with a first binding specificity in 
one arm, and a hybrid immunoglobulin heavy chain-light chain pair (providing a second binding specificity) in 
the other arm. It was found that this asymmetric structure facilitates the separation of the desired bispecific 
compound from unwanted immunoglobulin chain combinations, as the presence of an immunoglobulin light 

10 chain in only one half of the bispecific molecule provides for a facile way of separation. This approach is 
disclosed in PCT application WO 94/04690 published 3 March 1994 

For further details of generating bispecific antibodies see, for example, Suresh et aL, Methods in 
Enzvmologv 121, 210 (1986). 

(v) Heteroconjugate antibodies 

15 Heteroconjugate antibodies are also within the scope of the present invention. Heteroconjugate 

antibodies are composed of two covalently joined antibodies. Such antibodies have, for example, been proposed 
to target immune system cells to unwanted cells (U.S. Patent No. 4,676,980), and for treatment of HIV infection 
(PCT application publication Nos. WO 91/00360 and WO 92/200373; EP 03089). Heteroconjugate antibodies 
may be made using any convenient cross-linking methods. Suitable cross-linking agents are well known in the 

20 art, and are disclosed in U.S. Patent No. 4,676,980, along with a number of cross-linking techniques. 
I. Peptide and non-peptide analogs 

Peptide analogs of the type C lectins of the present invention are modelled based upon the three- 
dimensional structure of the native polypeptides. Peptides may be synthesized by well known techniques such 
as the solid-phase synthetic techniques initially described in Merrifield, J. Am. Chem. Soc. H, 2149-2154 

25 (1963). Other peptide synthesis techniques are, for examples, described in Bodanszky et al, Peptide Synthesis, 
John Wiley & Sons, 2nd Ed., 1976, as well as in other reference books readily available for those skilled in the 
art. A summary of peptide synthesis techniques may be found in Stuart and Young, Solid Phase Peptide 
Synthelia, Pierce Chemical Company, Rockford, IL (1 984). Peptides may also be prepared by recombinant DNA 
technology, using a DNA sequence encoding the desired peptide. 

30 In addition to peptide analogs, the present invention also contemplates non-peptide (e.g. organic) 

compounds which display substantially the same surface as the peptide analogs of the present invention, and 
therefore interact with other molecules in a similar fashion. 
J. Use of the type C lectins 

Amino acid sequence variants of the native type C lectins of the present inventon may be employed 
35 therapeutically to compete with the normal binding of the native proteins to their ligands. The type C lectin 
amino acid sequence variants are, therefore, useful as competitive inhibitors of the biological activity of native 
type C lectins. 

Native type C lectins and their amino acid sequence variants are useful in the identification and 
purification of their native ligands. The purification is preferably performed by immunoadhesins comprising a 
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type C lectin amino acid sequence retaining the qualitative ability of a native type C lectin of the present 
invention to recognize its native carbohydrate ligand. 

The native type C lectins of the present invention are further useful as molecular markers of the tissues 
in which they are expressed. 

5 Furthermore, the type C lectins of the present invention provide valuable sequence motifs which can 

be inserted or substituted into other native members of the endocytic type C lectins, such as a native mannose 
receptor, DEC205 receptor, or phospholipase A2 receptor. The alteration of these native proteins by the 
substitution or insertion of sequences from the novel type C lectins of the present invention can yield variant 
molecules with altered biological properties, such as ligand binding affinity or ligand specificity. For example, 
10 one or more lectin domains of another member of the endocytic type C lectin family may be entirely or partially 
replaced by lectin domain sequences derived from the type C lectins of the present invention. Similarly, 
fibronectin type II domain sequences from the type C lectins herein may be substituted or inserted into the amino 
acid sequences of other type C lectins. 

Nucleic acid encoding the type C lectins of the present invention is also useful in providing 
1 5 hybridization probes for searching cDNA and genomic libraries for the coding sequence of other type C lectins. 
Further details of the invention will be apparent from the following non-limiting example. 
Example 

New murine and human tvne C lectins 
A. Materials and Methods 
20 1. Isolation of cDNAs coding the murine and human lectins. 

According to the EST sequence, two 33 mers were synthesized (5' CCG GAA TTC CGG TTT GTT 
GCC ACT GGG AGC AGG3' (SEQ. ID. NO: 10) and 5'CCC AAG CTT GAA GTG GTC AGA GGC ACA 
GTT CTC3' (SEQ. ID. NO: 1 1)) for PCR (94°C, 1 min, 60°C 1 min and 72°C 1 min, for 35 cycles) using 5 
microliters of a human heart cDNA library (Clontech) as template. The 260-base PCR product was cloned (TA 

25 cloning kit, Invitrogen) and used as a probe to screen a human heart cDNA library as well as to probe Northern 
and Southern blots (Clontech). The same pair of primers was also used to amplify a mouse heart cDNA library 
with lower annealing temperature (55°C) and a mouse product with the same size (260 bp) was obtained. 
Screening of approximately 500,000 plaques from cDNA libraries was done using standard procedure with a 
randomly-labelled DNA probe. Single positive phage clones were isolated after two more rounds of rescreening. 

30 The size of the inserts was identified by PCR using two primes from the lambda gt 1 0 vector and the inserts were 
subcloned. DNA sequencing was performed on an Applied Biosystems automated DNA sequencer. To clone 
the 5 prime region of the transcripts, 5' RACE (Rapid Amplification of cDNA Ends) was performed using the 
most 5' end of the known sequence and the protocol for 5' RACE supplied by the manufacturer (Marathon-Ready 
cDNAs, Clontech) was followed. RACE products were subcloned and sequenced as described. 

35 2. Northern and Southern blot analyses 

The DNA probes were prepared by agarose gel purification (Gel Extraction Kit, Qiagen) and random 
labelling (Pharmacia). Blot hybridization was performed as described in manufacturer's instruction using 
commercially supplied blots (Clontech). 
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3. Characterization of the fetal liver transcript 

Sequencing of the RACE products using human fetal liver marathon-ready cDNA (Clontech) as 
template revealed a novel 5 prime region not found in the original heart-derived clones. To further characterize 
this transcript, PCR was performed on heart, lung and fetal liver using a common downstream primer with two 
5 different upstream primers. One upstream primer is from the lectin sequence, which is not present in fetal liver 
clone, and the other is from fetal liver unique sequence. The PCR products were analysed on agarose gel and 
hybridized by an oligonucleotide common to both transcripts. 

4. Isolation of genomic clones encoding the murine lectin 

A 129 mouse-derived embryonic cell (ES) genomic library was used for the screening by two lectin 
10 cDNA sequences. One is from the 5' end of the lectin coding sequence and the other one is from the 3' end of 
the cDNA. Screening of 500,000 plaques yielded three kinds of lectin genomic clones; positive for the 5 -end 
probe, the 3 -end probe and both. Recombinant phage DNA was isolated from plate lysates (Wizard Lambda 
Preps, Promega) and digested by Not I. Genomic DNA inserts were subcloned into a Not 1-digested pBlueScript 
SK vector using Rapid DNA Ligation Kit (Boehringer Mannheim), after heat inactivation of the restriction 
15 enzyme. The approximate locations of introns and exons were identified using dot-blot hybridization with 
specific oligonucleotide probes and PCR analysis of lambda clones using exon-specific probes. Physical mapping 
of the lectin gene was performed using restriction enzyme digestion of genomic clones followed by southern blot 
hybridization with exon-specific oligonucleotide probes. 

5. In situ hybridization 

20 In situ hybridization was performed essentially as previously described (Lasky et al , £eJl 62(6), 927-38 

[1992]). Briefly, antisense and sense riboprobes for this clone were generated by use of the polymerase chain 
reaction (PCR) to derive templates for subsequent in vitro transcription . In preparation for hybridization, 
sections were treated sequentially with 4% paraformaldehyde (10 minutes) and proteinase K (0.5 mg/mL, 15 
minutes) and then prehybridized with 50 mL of hybridization buffer at 42°C for 2 hours. Hybridization buffer 

25 consisted of 10% dextran sulfate, 2X SSC (sodium chloride/sodium citrate) and 50% formamide. Probes were 
added at a final concentration of 106 cpm/slide and the sections were incubated overnight at 55 C. 
Posthybridization washes consisted of 2X SSC containing 1 mM EDTA, before and after a 30 minute treatment 
with ribonuclease (20 mg/mL). A high-stringency wash consisting of 0.1 X SSC containing EDTA was 
performed in a large volume for 2 hours at 55°C. Sections were then washed in 0.5X SSC, dehydrated in 

30 increasing concentrations of ethanol and then vacuum desiccated. Slides were covered with NTB2 nuclear 
emulsion (Eastman Kodak, Rochester, NY) and exposed for up to 5 weeks. After the slides were developed they 
were counterstained with hematoxylin and eosin and evaluated by epiluminescent microscopy for positive 
hybridization. Serial sections of the tissues hybridized with the sense probes served as negative controls. 
B. Results 

35 The expressed sequence Jag (EST) database is a large collection of random cDNA sequences from a 

diversity of libraries. We probed the EST database in silico with the lectin domain of E-selectin. As can be seen 
in figure 1, a sequence (Tl 1885) was identified which showed low homology (-23%) to a region of the E- 
selectin lectin domain. While this homology appeared to be quite distant, we found that the residues that were 
identical were included in the subset of amino acids that have previously been shown to be conserved in the vast 
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majority of type C lectins (Drickhamer, J. Biol. Chem. 263. 9557-9560 [1988]). In addition, searching the 
GenBank-EMBL database with the novel EST-derived E-selectin related sequence resulted in only type C lectin 
homologies (data not shown), again consistent with the novel sequence being a member of this large family of 
proteins. 

5 Because the novel EST sequence was originally derived from a human heart cDNA library, a similar 

library was used for PCR analysis using primers deduced from the EST sequence. This resulted in a DNA 
fragment containing the same sequence as that found for the database entry, and this fragment was used to probe 
a human heart library. In addition, a murine fragment was also isolated using similar techniques, and this 
fragment was used for the isolation of a cDNA from a murine heart library. Figure 2 illustrates the full length 

10 sequence obtained for the murine cDNA clone. As can be seen from this figure, this large transcript encoded 
a protein of 1 ,479 residues with a molecular weight of approximately 1 67 kD. The human sequence revealed 
approximately 90% amino acid sequence homology with the murine protein. The ATG translational initiation 
codon shown in the murine sequence is in the context of a Kozak translational start site, and there are two stop 
codons 5 prime to this ATG. A search of the GenBank with the deduced murine protein sequence revealed that 

1 5 this novel sequence was most closely related to the macrophage mannose receptor (32.5% identity) (Taylor et 
al., supra; Harris etai, supra), the phosphoiipase A2 receptor (34% identity) (Higishino et a!., supra; Ishizaki 
et al., supra; Lambeau et aL, supra) and the DEC 205 receptor (33% identity) (Jiang et al., supra\ three 
members of the family of type C lectins containing multiple lectin domains which all mediate endocytosis 
(figure 3). These levels of sequence homology are similar to those found when these three lectin-like receptors 

20 are compared to each other, consistent with the supposition that the novel cDNA described here is a new 
member of this family. Further homology analysis by domains revealed that the highest sequence homologies 
between these four related proteins were found in the fibronectin type II and lectin-like domains 1-3, consistent 
with the possibility that these domains might be functionally important (figure 4). In addition, analysis of the 
cytoplasmic domain of the novel type C lectin also revealed that it contained the a conserved tyrosine residue 

25 (residue number 1 ,45 1) in a context similar to the NSYY motif that has been previously found to be important 
for the endocytosis of the phosphoiipase A2 receptor (Zvaritch et aL, supra). In summary, the novel receptor 
described here is related to three previously described lectins with an overall structure that consists of a signal 
sequence, a cysteine rich domain, a fibronectin type II domain, 8 type C lectin domains (10 such domains in 
the DEC 205 receptor), a transmembrane domain and a short cytoplasmic domain (figure 4). 

30 C. Analysis of the genomic structure of the novel type C lectin 

Southern blot analyses with a small region of the novel type C lectin revealed that it was encoded by 
a single copy, highly conserved gene, in agreement with the high degree of sequence homology between the 
murine and human cDNAs (figure 5). The gene encoding the murine form of the novel type C lectin, with the 
exception of the signal sequence and cysteine rich domain exons which could not be isolated from our library, 

35 was characterized using a combination of southern blotting, and PCR analysis of lambda clones using exon 
specific probes predicted from the human and murine macrophage mannose receptor gene structures (Kim et 
a!., QenpmicJ4(3), 721-727 [1992]; Harris etai. Biochem. Biophvs. Res. Commun. 128(2), 682-92 [1994]). 
As can be seen from figure 5, the gene was interrupted by a minimum of 28 introns and was spread across at 
least 39 kB of DNA. This genomic structure is therefore highly reminiscent of that found for the human and 
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murine macrophage mannose receptors, both of which were interrupted by a similar number of introns at similar 
sites. These data are thus consistent with the supposition that the members of this family of type C lectins were 
all derived from an original progenitor gene which was than duplicated and mutated to give rise to these four 
different proteins with different functions. 
5 D. Northern blot analysis of transcripts encoding the novel type C lectin 

A diverse collection of murine and human tissues were analyzed for expression of the transcript 
encoding the novel type C lectin. As can be seen from figure 6, the transcript was found to be expressed in the 
earliest murine embryonic stage examined (day 7) and its expression continued throughout embryonic 
development. Analysis of human fetal tissues revealed that the transcript was highly expressed in lung and 

1 0 kidney. Interestingly, a truncated transcript was found to be expressed predominately in the fetal liver, and this 
transcript will be described in greater detail below. Analysis of adult murine tissues revealed that high levels 
of expression were detected in the heart, lung and kidney, with lower levels in the brain and muscle. 
Interestingly, the transcript in the adult liver in both humans and mice appears to be absent, further supporting 
the specificity of the alternately spliced transcript to the fetal liver. Analysis of expression in human tissues 

1 5 revealed that there were also high transcript levels in the heart as well as in prostate, testis, ovary and intestine, 
with lower levels in brain, placenta, lung, kidney, pancreas, spleen, thymus and colon. Analysis of expression 
in various transformed cells (figure 6) revealed that the novel lectin was transcribed in at least two different 
hematopoietic cell lines, in contrast to its apparent lack of expression in human peripheral blood leukocytes 
(PBL). In addition, several other transformed cell lines derived from various tumors were also positive for the 

20 expression of this lectin. In summary, analysis of expression of the novel type C lectin suggests that it is 
expressed in a diversity of tissues and throughout development, although it appears to be absent from adult liver 
and is found as smaller transcript in fetal liver. The expression of a smaller transcript in human fetal liver, 
together with the complex genomic structure described above, suggested that this RNA might have been 
produced through alternate splicing. Analysis of RACE clones derived from the fetal liver revealed that the 

25 smaller transcript appeared to have a divergent 5 prime sequence. In order to further characterize this transcript, 
a human fetal liver library was screened, and the resultant positive phage were sequenced. One positive phage 
was found which appeared to encode a partial cDNA which corresponded to the smaller transcript Thus, as can 
be seen from figure 7, the resultant sequence is identical to the original, full length lectin until nucleotide 61, 
where a divergent sequence is found leading to the 5' end of the transcript contained within this phage. This is 

30 the identical splice site found for intron number 18 in the mannose receptor (Kim et ai, supra, Harris et a!., 
supra), which interrupts a region in the carboxy-terminus of the fifth lectin domain, consistent with alternate 
splicing. In order to demonstrate that this transcript exists, as well as to investigate its tissue specificity, specific 
primers were designed from the original transcript as well as from the smaller, alternately spliced transcript 
(figure 7). As can be seen from figure 7, analysis of lung, heart and fetal liver RNA revealed that the alternately 

35 spliced, small transcript was specific to the fetal liver, although this tissue also appeared to make the full length 
transcript as well. In addition, analysis of a tissue northern blot with a 30-mer oligonucleotide specific for the 
novel region in this transcript revealed a signal only in the fetal liver corresponding to this small RNA (data not 
shown). Because the size of the transcript on northern blots suggests that this alternately spliced transcript should 
extend for only a relatively short distance 5' to the lambda clone isolated here. 
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£. In situ hybridization analysis of the novel type C lectin 

In order to examine the types of cells which expressed the transcript encoding the novel type C lectin, 
in situ hybridization analyses were performed using murine neonatal and adult tissues. As can be seen from 
figure 8, this transcript was found in two very divergent tissue types. For example, the northern blot analysis 
5 of murine adult tissues as well as human fetal tissues (figure 7) suggested a high level of expression of the 
transcript in lung, and figure 8 illustrates that this RNA was found to be clearly expressed in the lung. Although 
it is difficult to tell at the resolution of the in situ experiments the exact cellular location of the transcript, 
because of the highly vascularized nature of the lung, it is possible that it is expressed by the lung endothelium. 
The transcript was also found at a number of other highly endothelial ized sites, including, for example, the 

10 choroid plexus and the kidney glomerulai (figure 8), but it was not universally expressed at detectable levels 
in ail endothelium. In addition, examination by PCR of endothelial cell lines derived from murine yolk sac also 
demonstrated expression of the lectin (data not shown). The figure also illustrates that the transcript was found 
to be highly expressed by chondrocytes at sites of active cartilage deposition. As can be seen in this figure, the 
collagenous region of the larynx produced a high level of this transcript as did other bone forming regions in 

15 the neonate including the developing sternal bones as well as the developing teeth. These data suggest that, in 
contrast to the restricted expression of the previously reported members of this family, the novel type C lectin 
described here appears to be expressed in a diversity of highly endothelialized regions and bone forming sites 
in the embryo as well as in the adult. 
G. Discussion 

20 The recognition of cabohydrates by various calcium dependent, or type C, lectins has recently been 

acknowledged as a major aspect of a number of physiological phenomena. These include, for example, the 
adhesion of various leukocytic cells to the endothelium under the conditions of vascular flow (Lasky, Ann. Rev. 
Biochem. 64. 1 13-139 [1995]), the binding and engulfment of pathogenic organisms by macrophages (Harris 
et al, supra), the recognition of transformed cells by natural killer (NK) cells (Bezouska et al, Nature 

25 222(6502), 150-7 [1994]) and the removal of desialated glycoproteins from the circulation. The importance of 
these types of interactions have been significantly highlighted by both naturally occurring as well as induced 
mutations. For example, naturally occurring human mutations in the circulating mannose binding protein result 
in sensitivity to various pathogenic infections in affected individuals (Lipscombe et al , Immunology 35(4), 660- 
7 [1995]), and the production of animals with mutations in various selectin genes precipitates profound defects 

30 in leukocyte trafficking (Mayadas et al, £eil 24(3), 541-554 [1993]; Arbones et al, Immunity I, 247-260 
[1994]). While neither naturally occurring nor induced mutations have yet been reported for the family of 
endocytic type C lectins, various in vitro data support the contention that these lectins are also important for a 
range of potentially critical functions. We here describe a novel member of the endocytic lectin family which 
contains many of the structural features of the previously described members but which reveals several 

35 differences in expression sites with potentially important functional implications. Comparison of the overall 
structure of the novel receptor reported here suggests that it is clearly a member of the endocytic type C lectin 
family. This is based upon the clearcut conservation of each of the protein motifs found in this family as 
compared to those found in the novel lectin. Thus, the novel receptor contains regions which are homologous 
to the cysteine rich, fibronectin type II and multiple lectin domain motifs found in the other three members of 
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this lectin family, in addition to a signal sequence and transmembrane domain which would orient the receptor 
as a typel transmembrane protein. Interestingly, the cytoplasmic domain is also homologous with the other 
members of this family, and this homology includes a conserved tyrosine within a context similar to the NSYY 
motif which is critical for endocytosis (Zvaritch et aL, supra). Thus, while the levels of conservation between 
5 these family members appears to be quite low (-30-35%), their overall predicted protein domain structures as 
well as the exon structures of at least the genes for the human and murine mannose macrophage receptors (Kim 
et aL supra, Harris et aL, supra), as well as the novel receptor reported here suggests that they are clearly a 
related family of receptors. Thus, it is highly likely that this novel receptor is involved in the uptake of Hgands 
for the purpose of an endocytic response as has been found for the other proteins of this family, 
10 With respect to ligand recognition by the novel receptor, previous work has implicated the type C lectin 

domains as being critical for the binding activity of the other members of this family. For example, various 
deletion analyses of both the macrophage mannose receptor (see the two Taylor et aL articles, supra) and the 
phospholipase A2 receptor (Ishizaki et aL, supra) have revealed that the type C lectin motifs are involved with 
the binding of either high mannose containing glycoproteins (the macrophage mannose receptor) or to 
1 5 phospholipase A2 ( the phospholipase A2 receptor). Interestingly, in the case of the latter receptor, the binding 
of phospholipase is not carbohydrate dependent, although this receptor will also bind with significant affinity 
to highly glycosylated neoglycoprotiens such as mannose-BSA (Lambeau et aL, supra). The need for multiple 
carbohydrate recognition motifs is underlined by the finding that the affinity of the macrophage mannose 
receptor for glycosylated proteins is enhanced when more than one motif is expressed in the context of a 
20 truncated receptor (see the two Taylor et aL articles, supra). Because the DEC 205 receptor also appears to bind 
glycosylated antigens in order to enhance antigen presentation by dendritic cells and thymic epithelium (Jiang 
et aL, supra), it seems highly likely that it too utilizes a multiplicity of lectin motifs for high affinity ligand 
binding. Finally, comparative analysis of the sequences of the type C lectin motifs in the novel receptor with 
those found in the co-crystal structure of the mannose binding protein and mannose (the two Weis et aL papers, 
25 supra; Drickhamer et aL, supra) (K. Drickamer-personnel communication) demonstrates that many of the amino 
acids involved with the ligation of calcium and the recognition of either mannose or galactose are found in the 
first two lectin motifs of the novel protein, consistent with a role for these motifs in carbohydrate recognition. 
Interestingly, this is in contrast with the macrophage mannose receptor, where the fourth lectin type domain 
appears to be the one that is most critical for carbohydrate recognition (the two Taylor et al. papers, supra). In 
30 summary, these data thus support the contention that the related lectin reported here is also involved with the 
recognition of a highly glycosylated ligand(s) in order to mediate an endocytic uptake. 

While the data reported here suggest that the mechanisms of ligand recognition by the novel endocytic 
type C lectin may be related to those previously described for the other family members, analysis of the 
expression patterns of this new protein suggest that it potentially performs a novel task(s). The expression 
35 patterns of two of the members of the endocytic lectin family, the macrophage mannose receptor and the DEC 
205 receptor, reveal a highly restricted transcription of these proteins in macrophages and liver endothelial cells 
(the macrophage mannose receptor) or in dendritic cells and thymic epithelium (the DEC 205 receptor), and 
these patterns correlate with the known functions of these receptors in immune system function. A broader 
expression pattern is observed for the phospholipase A2 receptor. This endocytic receptor is expressed in 
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various tissues of the embryo and the adult, including the heart, lung, kidney, skeletal muscle and liver in the 
adult mouse and the kidney in the embryonic human. This pattern is somewhat reminiscent of the novel receptor 
described here, especially the expression in the adult heart, lung and kidney. However, there are several 
differences between these two receptors, including the expression of the novel receptor in the embryonic lung 
5 as a large transcript and in the fetal liver as a small, alternate spliced transcript. In addition, the novel receptor 
is not expressed at all in adult liver, in contrast to the phospholipase A2 receptor. These differences in expression 
pattern are consistent with differences in function between these two more widely expressed lectin-like 
receptors. 

The cell types that express the novel endocytic lectin also give some clues as to its possible function. 

10 The relatively widespread transcription in adult tissues is consistent with endothelial expression, and the in situ 
hybridization analysis also supports this contention. Thus, even though the resolution of these experiments was 
insufficient to exactly identify the cell types expressing the novel lectin, it was often found in highly 
vascularized areas, including the lung, the kidney glomerulus, the choroid plexus and the bone marrow, to name 
a few. These data thus suggest that the novel lectin might function as a vascular carbohydrate binding protein. 

15 In contrast, other members of this family, including the macrophage mannose receptor and the DEC 205 
receptor, appear to function as mediators of the immune system, and they are expressed on a small subset of 
adult immune system cells. However, because the embryo is in a sterile environment, it is unlikely that the 
currently described lectin is involved with this type of function, predominately because it is expressed 
throughout embryonic development beginning as early as day 7 of mouse development. One possible function 

20 that this lectin could perform in the vasculature might be to transport highly glycosylated proteins across the 
blood vessel. This could occur either from the lumenal side of the vessel to the extravascular space or in the 
other direction, depending upon the disposition of the lectin. If the lectin faced the lumenal side, it might thus 
function to transport highly glycosylated proteins from the vascular flow to the extravascular space. Consistent 
with its expression on the endothelium is its identification in various endothelial cell lines derived from the 

25 embryo. This type of possible function is, therefore, similar to that hypothesized for the macrophage mannose 
receptor expressed on endothelial cells of the liver. In this case, this receptor appears to mediate the clearance 
of desialated proteins from the bloodstream. The investigation of this hypothesis awaits the production of 
antibodies directed against this novel lectin, which will allow for a higher resolution analysis of the actual 
cellular localization of this protein in the embryo and adult. The high level of expression of the novel lectin in 

30 chondrocytes also suggests interesting possibilities. In contrast to endothelial cells, these cells are not directly 
exposed to the blood stream, so it is unlikely that the lectin binds to identical ligands in the case of these matrix- 
depositing cells. Expression of the lectin was detected in regions of mineralization, such as the sternal and tooth 
regions, as well as sites of cartilage deposition, such as the layrnx. These data suggest that the lectin might be 
involved with the synthesis of cartilage or other types of extracellular matrix produced by the chondrocytes. If 

35 the novel lectin described here is indeed found to be involved with endocytosis, than one possible function in 
chondrocytes might be the uptake of highly glycosylated precursor proteins that are degraded and utilized for 
extracellular matrix production. A contrasting possibility might be that the chondrocytes utilize this lectin to 
remodel the extracellular matrix by the endocytosis of highly glycosylated proteins. 
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Finally, the identification of the alternately spliced transcript that is specific for the human fetal liver 
is a very interesting result with potential implications to hematopoiesis, although the lack of a start codon in the 
current clone does not allow us to predict that this transcript encodes a protein. PCR analysis of this transcript 
clearly demonstrated that it was completely absent from the heart and lung, and northern blot analysis revealed 
5 a lack of signal for this or the full-length transcript in adult liver. Because fetal liver is a conspicuously important 
site of hematopoiesis in the embryo, this result suggests that this transcript may in some way be involved with 
fetal hematopoiesis. The possible endothelial localization of the transcript also suggests a possible involvement 
in blood cell production, since previous work has suggested that endothelial cells appear to be involved with 
the expansion of progenitor cells in the embryo. Interestingly, the spliced transcript lacks the first two lectin 
10 domains which, by sequence homology with the mannose binding protein, may be involved with carbohydrate 
recognition. Thus, it is likely that, if this transcript encodes a protein product, that this form of the lectin might 
utilize other regions of the extracellular portion of the protein for novel receptor- ligand interactions. 

In summary, the data reported here provide evidence for a novel member of the endocytic type C lectin 
family. This glycoprotein appears to be expressed in a wide variety of tissues in the embryo and adult, and it 
15 is transcribed by chondrocytes and, possibly, endothelial cells. 

All documents cited throughout the specification as well as the reerences cited therein are hereby 
expressly incorporated by reference. While the present invention is illustrated with reference to specific 
embodiments, the invention is not so limited. It will be understood that further modifications and variations are 
possible without diverting from the overall concept of the invention. All such modifications are intended to be 
20 within the scope of the present invention. 
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35 (D) TOPOLOGY: Linear 
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TGCGATCCCC TCGCCGGCGG TCATCCGAGC ACAGCGCTAG GGCTGTCTCT 50 

GCACGCAGCC CTGCCGTGCG CCCTCCGTAC TCTCGTCCTC CGAGCGCCGC 100 

AGGGATGGTA CCCATCCGAC CTGCCCTCGC GCCCTGGCCT CGTCACCTGC 15 0 

40 TGCGCTGCGT CTTGCTTCTC GGGGGACTGC GTCTCGGCCA CCCGGCGGAC 20 0 

TCCGCCGCCG CCCTCCTGGA GCCTGATGTC TTCCTCATCT TCAGCCAGGG 25 0 
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GATGCAGGGC TGTCTGGAGG CCCAGGGTGT GCAGGTCCGA GTCACCCCAT 3 00 

TCTGCAATGC CAGTCTCCCT GCCCAGCGCT GGAAGTGGGT CTCCCGGAAC 3 50 
CGACTCTTCA ACCTGGGTGC CACACAGTGC CTGGGTACAG GCTGGCCAGT 4 00 
CACCAACACC AC AGTTT CCT TGGGCATGTA TGAGTGTGAC AGAGAGGCCT 4 50 
5 TGAGTCTTCG GATGGCAGTG TCGTACACTA GGGGACCAGT TGTCCCTGCT 500 
TCTGGGGGCT CGTG CAAGC A ATGCATCCAA GCCTGGCACC TGGAGCGCGG 55 0 
TGACCAGACC CGCAGTGGCC ATTGGAACAT CTATGGCAGT GAAGAAGACC 600 
TATGTGCTCG ACCTTACTAT GAGGTCTACA CCATCCAGGG AAACTCACAC 650 
GGAAAGCCGT GCACTATCCC CTTCAAATAC GACAACCAGT GGTTCCACGG 700 
10 CTGCACCAGC ACTGGCAGAG AAGATGGGCA CCTGTGGTGT GCCACCACCC 75 0 
AGGACTACGG CAAAGATGAG CGCTGGGGCT TCTGCCCCAT CAAGAGTAAC 800 
GACTGTGAGA CCTTCTGGGA CAAAGACCAG CTGACTGACA GCTGTTACCA 8 50 
GTTTAACTTC CAATCCACAC TGTCCTGGAG GGAGGCCTGG GCCAGCTGCG 90 0 
AGCAGCAGGG TGCAGACTTG CTGAGTATCA CGGAGATCCA CGAGCAGACC 950 
15 TACATCAACG GGCTCCTCAC GGGCTACAGC TCCACGCTAT GGATTGGCCT 1000 
TAATGACCTG GAT AC C AGTG GAGG CTGGCA GTGGTCAGAC AACTCACCCC 1050 
TCAAGTACCT CAACTGGGAG AGTGATCAGC CGGACAACCC AG GTG AGG AG 1100 
AACTGTGGAG TGATCCGGAC TGAGTCCTCA GGCGGCTGGC AGAACCATGA 1150 
CTGCAGCATC GCCCTGCCCT ATGTTTGCAA GAAGAAACCC AACGCTACGG 12 00 
20 TCGAGC CC AT CCAGCCAGAC CGGTGGACCA ATGTCAAGGT GGAATGTGAC 12 50 
CCCAGCTGGC AGCCCTTCCA GGGCCACTGC TACCGCCTGC AGGCCGAGAA 13 00 
GCGCAGCTGG CAGGAGTCCA AGAGGGCGTG TCTGCGGGGT GGGGGTGACC 13 50 
TCCTTAGCAT CCACAGCATG GCTGAGCTGG AGTTCATCAC CAAACAGATC 14 0 0 
AAGCAAGAGG TGGAGGAGCT ATGGATTGGC CTCAATGATT TGAAACTG C A 14 50 
25 GATGAATTTT GAGTGGTCCG ACGGGAGCCT CGTGAGCTTC ACCCACTGGC 1500 
ACCCCTTTGA GCCCAACAAC TTTCGTGACA GCCTGGAGGA C TGTGTC AC C 155 0 
ATCTGGGGGC CGGAAGGACG CTGGAACGAC AGTCCCTGTA ACCAGTCCTT 160 0 
GCCATCCATT TGCAAGAAGG CAGGCCGGCT GAGCCAGGGC GCTGCGGAGG 1650 
AGGACCACGA CTGCCGGAAG GGTTGGACGT GGCATAGCCC ATCCTGCTAC 17 00 
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TGG CTGGG AG AGGACCAAGT GATCTACAGT GATGCCCGGC GCCTGTGTAC 1750 

TGACCATGGC TCTCAGCTGG TCACCATCAC CAACAGGTTT GAGCAGGCCT 1800 
TCGTCAGCAG CCTCATCTAT AACTGGGAGG GCGAATACTT CTGGACAGCC 1850 
CTGCAAGACC TCAACAGTAC TGGCTCCTTC CGTTGGCTCA GTGGGGATGA 190 0 
5 AGTCATATAT ACC CATTGGA ATCGAGAC CA GCCTGGGTAC AGACGTGGAG 195 0 
GCTGTGTGGC TCTGGCCACT GGCAGTGCCA TGGGACTGTG GGAGGTGAAG 2 0 00 
AACTGCACAT CGTTCCGGGC TCGCTACATC TGCCGACAGA GCCTGGGCAC 2 05 0 
ACCGGTCACA CCAGAGCTGC CTGGGCCAGA CCCCACGCCC AGCCTCACTG 2100 
GCTCCTGTCC CCAGGGCTGG GTCTCAGACC CCAAACTCCG ACACTGCTAT 2150 
10 AAGGTGTTCA GCTCAGAGCG GCTGCAGGAG AAGAAGAGTT GGATCCAGGC 22 00 
CCTGGGGGTC TGCCGGGAGT TGGGGGCCCA GCTGCTGAGT CTGGCCAGCT 2250 
ATGAGGAGGA GCACTTTGTG GCCCACATGC TCAACAAGAT CTTTGGTGAG 2 3 00 
TCAGAGCCTG AGAGCCATGA GCAGCACTGG TTTTGGATTG GCCTGAACCG 23 50 
CAGAGACCCT AGAGAGGGTC ACAGCTGGCG CTGGAGCGAC GGTCTAGGGT 24 00 

15 TTTCCTACCA CAATTTTGCC CGGAGCCGAC ATGATGACGA TGATATCCGA 24 50 
GGCTGTGCAG TGCTGGACCT GGCCTCCCTG CAGTGGGTAC CCATGCAGTG 2 5 00 
CCAGACGCAG CTTGACTGGA TCTGCAAGAT CCCTAGAGGT GTGGATGTGC 2 550 
GGG AAC CAGA CATTGGTCGA CAAGGCCGTC TGGAGTGGGT ACGCTTTCAG 26 00 
GAGGCCGAGT ACAAGTTTTT TGAGCACCAC TCCTCGTGGG CGCAGGCACA 26 5 0 

20 GCGCATCTGC ACCTGGTTCC AGGCAGATCT GACCTCCGTT CACAGCCAAG 27 00 
CAGAACTGGG CTTCCTGGGG CAAAACCTGC AGAAGCTGTC CTCAGACCAG 2 75 0 
GAGCAGCACT GGTGGATCGG CCTGCACACC TTGGAGAGTG ACGGACGCTT 2800 
CAGGTGGACA GATGGTTCTA TTATAAACTT CATCTCTTGG GCACCGGGAA 28 50 
AACCTAGACC CATTGGCAAG GACAAGAAGT GTGTATACAT GACAGCCAGA 2 900 

25 CAAGAGGACT GGGGGGACCA GAGGTGCCAT ACGGCTTTGC CCTACATCTG 2950 
TAAGCGCAGC AATAGCTCTG GAGAGACTCA GCCCCAAGAC TTGCCACCTT 300 0 
CAGCCTTAGG AGGCTGCCCC TCCGGTTGGA ACCAGTTCCT CAATAAGTGT 3 050 
TTCCGAATCC AGGGCCAGGA CCCCCAGGAC AGGGTGAAAT GGTCAGAGGC 3100 
ACAGTTCTCC TGTGAACAGC AAGAAGC CCA GCTGGTCACC ATTGCAAACC 315 0 
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CCTTAGAGCA AGCATTTATC ACAGCCAGCC 

CTTTGGATTG GCCTGCATGC CTCTCAGAGG 
AGAACCCCTG CTCTATACCA ACTGGGCACC 
GCCCTGCTCC CAGTGGCACC AAGCCGACCA 
AGCCCCTCAG CCCACTTCAC TGGC CGCTGG 
GGAGACGCAT GGCTTCATCT GCCAGAAGGG 
CATCCCCAGC AGCAACACCC CCTGCCCCGG 
AACCACACCT TCCGGCTGCT GCAGAAGCCA 
CCTGCTGTGT GAGAG CCGAA ATGCCAGCCT 
AC AC ACAAG C CTTCCTCACA CAGGCTGCAC 
TGGATCGGGC TGGCCAGTGA GGAGGGCTCA 
AGAGGAGCCT CTGAATTATG TGAGCTGGCA 
CGGGAGGCTG TGCCTACGTG GATGTGGATG 
TGTGATACCA AGCTGCAGGG GGCAGTGTGT 
ACCCCGAAGG ATAAACTACC GTGGCAGCTG 
CGTCCTGGAT TCCCTTCAGG GAGCATTGCT 
CTGTTGGGCC ACAAGGAGGC GCTGCAGCGC 
GGTTCTGTCC ATTCTTGATG AGATGGAGAA 
TGC AG AC AG C TGAAGCCCAA AGTCGAGGTG 
AACCCCAAAG GAGGCACGCT GGTCTGGCAA 
TTCTAACTGG GGGCCCCCTG GCCTGGGCCC 
GCTGCTACTG GATCCAGAGC AGCAGCGGAC 
ACCAACATCA CCATGGGAGT TGTCTGCAAG 
CAGCTTCTTG CCATCAGCAG CCCTCCCCGA 
TGGTGCTGAC AGCGGTGCTG CTCCTCCTGG 
ATCCTCTACC GGCGCCGACA GAGTGCGGAG 
CCGCTACAGT CGCAGCAGCC ACTCTGGCCC 
ACATTCTGGT GTCTGACATG GAAATGAACG 
GGGCGTGGTC GGGGTGGAGC CAAAGCGGGG 
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TCCCCAACGT GACCTTTGAC 32 0 0 

GACTTCCAGT GGATTGAACA 3250 
AGGAGAGCCC TCTGGCCCCA 3300 
GCTGTGCGGT GATCCTGCAC 3 3 50 
GATGATCGGA GCTGCACAGA 34 00 
CACAGACCCC TCGCTAAGCC 34 50 
GCGCTGAGCT CTCCTATCTC 3500 
CTGCGCTGGA AAGATGCTCT 3 550 
GGCACACGTG CCCGATCCCT 3600 
GGGGGCTGCA AACAC CACTG 3650 
CGGAGGTATT CCTGGCTCTC 37 00 
AGATGAGGAG CCCCAGCACT 3 7 50 
GAACCTGGCG CACCACCAGC 38 00 
GGGGTGAGCA GGGGGCCCCC 38 50 
TCCTCAGGGC TTGGCTGACT 3 900 
ATTCTTTCCA CATGGAGGTG 3 95 0 
TGTCAGAAAG CTGGTGGGAC 4 000 
TGTGTTTGTC TGGGAGCACC 4050 
CCTGGTTGGG CATGAACTTC 4100 
GACAACACAG CTGTGAACTA 415 0 
TAGCATGCTA AGCCACAACA 42 00 
TGTGGCGCCC CGGGGCTTGT 4250 
CTCCCTAGAG TGGAAGAGAA 43 00 
GAGCCCGGTT GCCCTGGTGG 43 50 
CCTTGATGAC GGCAGCCCTC 4400 
CGTGGGTCCT TCGAGGGGGC 44 5 0 
CGCAGAGGCC ACCGAGAAGA 4500 
AACAGCAAGA ATAGAGCCAA 45 50 
GAGGCAGG 4 588 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 79 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Val Pro lie Arg Pro Ala Leu Ala Pro Trp Pro Arg His Leu 
15 10 15 

Leu Arg Cys Val Leu Leu Leu Gly Gly Leu Arg Leu Gly His Pro 
10 20 25 30 

Ala Asp Ser Ala Ala Ala Leu Leu Glu Pro Asp Val Phe Leu lie 
35 40 45 

Phe Ser Gin Gly Met Gin Gly Cys Leu Glu Ala Gin Gly Val Gin 
50 55 60 

15 Val Arg Val Thr Pro Val Cys Asn Ala Ser Leu Pro Ala Gin Arg 

65 70 75 

Trp Lys Trp Val Ser Arg Asn Arg Leu Phe Asn Leu Gly Ala Thr 
80 85 90 

Gin Cys Leu Gly Thr Gly Trp Pro Val Thr Asn Thr Thr Val Ser 
20 95 100 105 

Leu Gly Met Tyr Glu Cys Asp Arg Glu Ala Leu Ser Leu Arg Met 
110 115 120 

Ala Val Ser Tyr Thr Arg Gly Pro Val Val Pro Ala Ser Gly Gly 
125 130 135 

25 Ser Cys Lys Gin Cys lie Gin Ala Trp His Leu Glu Arg Gly Asp 

140 145 150 

Gin Thr Arg Ser Gly His Trp Asn lie Tyr Gly Ser Glu Glu Asp 
155 160 165 

Leu Cys Ala Arg Pro Tyr Tyr Glu Val Tyr Thr lie Gin Gly Asn 
30 170 175 180 

Ser His Gly Lys Pro Cys Thr lie Pro Phe Lys Tyr Asp Asn Gin 
185 190 195 

Trp Phe His Gly Cys Thr Ser Thr Gly Arg Glu Asp Gly His Leu 
200 205 ~ 210 

35 Trp Cys Ala Thr Thr Gin Asp Tyr Gly Lys Asp Glu Arg Trp Gly 

215 220 *" 225 

Phe Cys Pro lie Lys Ser Asn Asp Cys Glu Thr Phe Trp Asp Lys 
230 235 ~ 240 

Asp Gin Leu Thr Asp Ser Cys Tyr Gin Phe Asn Phe Gin Ser Thr 
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Leu Ser Trp Arg Glu Ala Trp Ala Ser Cys Glu Gin Gin Gly Ala 
260 265 270 

Asp Leu Leu Ser lie Thr Glu lie His Glu Gin Thr Tyr lie Asn 
5 275 280 285 

Gly Leu Leu Thr Gly Tyr Ser Ser Thr Leu Trp lie Gly Leu Asn 
290 295 300 

Asp Leu Asp Thr Ser Gly Gly Trp Gin Trp Ser Asp Asn Ser Pro 
305 310 315 

10 Leu Lys Tyr Leu Asn Trp Glu Ser Asp Gin Pro Asp Asn Pro Gly 

320 325 330 

Glu Glu Asn Cys Gly Val lie Arg Thr Glu Ser Ser Gly Gly Trp 
335 340 345 

Gin Asn His Asp Cys Ser lie Ala Leu Pro Tyr Val Cys Lys Lys 
15 350 355 360 

Lys Pro Asn Ala Thr Val Glu Pro He Gin Pro Asp Arg Trp Thr 
365 370 375 

Asn Val Lys Val Glu Cys Asp Pro Ser Trp Gin Pro Phe Gin Gly 
380 385 390 

20 His Cys Tyr Arg Leu Gin Ala Glu Lys Arg Ser Trp Gin Glu Ser 

395 400 405 

Lys Arg Ala Cys Leu Arg Gly Gly Gly Asp Leu Leu Ser He His 
410 415 420 

Ser Met Ala Glu Leu Glu Phe He Thr Lys Gin He Lys Gin Glu 
25 425 430 435 

Val Glu Glu Leu Trp He Gly Leu Asn Asp Leu Lys Leu Gin Met 
440 445 450 

Asn Phe Glu Trp Ser Asp Gly Ser Leu Val Ser Phe Thr His Trp 
455 460 465 

30 His Pro Phe Glu Pro Asn Asn Phe Arg Asp Ser Leu Glu Asp Cys 

470 475 480 

Val Thr He Trp Gly Pro Glu Gly Arg Trp Asn Asp Ser Pro Cys 
485 490 495 

Asn Gin Ser Leu Pro Ser He Cys Lys Lys Ala Gly Arg Leu Ser 
35 500 505 510 

Gin Gly Ala Ala Glu Glu Asp His Asp Cys Arg Lys Gly Trp Thr 
515 520 525 

Trp His Ser Pro Ser Cys Tyr Trp Leu Gly Glu Asp Gin Val He 
530 535 540 
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Tyr Ser Asp Ala Arg 
545 

Val Thr lie Thr Asn 
560 



Arg Leu Cys Thr Asp 
550 

Arg Phe Glu Gin Ala 
565 



His Gly Ser Gin Leu 
555 

Phe Val Ser Ser Leu 
570 



5 lie Tyr Asn Trp Glu Gly Glu Tyr Phe Trp Thr Ala Leu Gin Asp 

575 580 585 

Leu Asn Ser Thr Gly Ser Phe Arg Trp Leu Ser Gly Asp Glu Val 

590 595 600 

lie Tyr Thr His Trp Asn Arg Asp Gin Pro Gly Tyr Arg Arg Gly 

10 605 610 615 

Gly Cys Val Ala Leu Ala Thr Gly Ser Ala Met Gly Leu Trp Glu 

620 625 630 

Val Lys Asn Cys Thr Ser Phe Arg Ala Arg Tyr lie Cys Arg Gin 

635 640 645 

15 Ser Leu Gly Thr Pro Val Thr Pro Glu Leu Pro Gly Pro Asp Pro 

650 655 660 

Thr Pro Ser Leu Thr Gly Ser Cys Pro Gin Gly Trp Val Ser Asp 

665 670 675 

Pro Lys Leu Arg His Cys Tyr Lys Val Phe Ser Ser Glu Arg Leu 

20 680 685 690 

Gin Glu Lys Lys Ser Trp lie Gin Ala Leu Gly Val Cys Arg Glu 

695 700 705 

Leu Gly Ala Gin Leu Leu Ser Leu Ala Ser Tyr Glu Glu Glu His 

710 715 720 

25 Phe Val Ala His Met Leu Asn Lys lie Phe Gly Glu Ser Glu Pro 

725 730 735 

Glu Ser His Glu Gin His Trp Phe Trp lie Gly Leu Asn Arg Arg 

740 745 750 



Asp Pro Arg Glu Gly His Ser Trp Arg Trp Ser Asp Gly Leu Gly 
30 755 760 765 

Phe Ser Tyr His Asn Phe Ala Arg Ser Arg His Asp Asp Asp Asp 
770 775 780 

lie Arg Gly Cys Ala Val Leu Asp Leu Ala Ser Leu Gin Trp Val 
785 790 795 

35 Pro Met Gin Cys Gin Thr Gin Leu Asp Trp lie Cys Lys lie Pro 

800 805 810 

Arg Gly Val Asp Val Arg Glu Pro Asp lie Gly Arg Gin Gly Arg 
815 820 825 

Leu Glu Trp Val Arg Phe Gin Glu Ala Glu Tyr Lys Phe Phe Glu 
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His His Ser Ser Trp Ala Gin Ala Gin Arg lie Cys Thr Trp Phe 
845 850 855 

Gin Ala Asp Leu Thr Ser Val His Ser Gin Ala Glu Leu Gly Phe 
5 860 865 870 

Leu Gly Gin Asn Leu Gin Lys Leu Ser Ser Asp Gin Glu Gin His 
875 880 885 

Trp Trp lie Gly Leu His Thr Leu Glu Ser Asp Gly Arg Phe Arg 
890 895 900 

10 Trp Thr Asp Gly Ser lie lie Asn Phe lie Ser Trp Ala Pro Gly 

905 910 915 

Lys Pro Arg Pro lie Gly Lys Asp Lys Lys Cys Val Tyr Met Thr 
920 925 930 

Ala Arg Gin Glu Asp Trp Gly Asp Gin Arg Cys His Thr Ala Leu 
15 935 940 945 

Pro Tyr lie Cys Lys Arg Ser Asn Ser Ser Gly Glu Thr Gin Pro 
950 955 960 

Gin Asp Leu Pro Pro Ser Ala Leu Gly Gly Cys Pro Ser Gly Trp 
965 970 975 

20 Asn Gin Phe Leu Asn Lys Cys Phe Arg lie Gin Gly Gin Asp Pro 

980 985 990 

Gin Asp Arg Val Lys Trp Ser Glu Ala Gin Phe Ser Cys Glu Gin 
995 1000 1005 

Gin Glu Ala Gin Leu Val Thr lie Ala Asn Pro Leu Glu Gin Ala 
25 1010 1015 1020 

Phe lie Thr Ala Ser Leu Pro Asn Val Thr Phe Asp Leu Trp lie 
1025 1030 1035 

Gly Leu His Ala Ser Gin Arg Asp Phe Gin Trp lie Glu Gin Glu 
1040 1045 1050 

30 Pro Leu Leu Tyr Thr Asn Trp Ala Pro Gly Glu Pro Ser Gly Pro 

1055 1060 1065 

Ser Pro Ala Pro Ser Gly Thr Lys Pro Thr Ser Cys Ala Val lie 
1070 1075 1080 

Leu His Ser Pro Ser Ala His Phe Thr Gly Arg Trp Asp Asp Arg 
35 1085 1090 1095 

Ser Cys Thr Glu Glu Thr His Gly Phe lie Cys Gin Lys Gly Thr 
1100 1105 1110 

Asp Pro Ser Leu Ser Pro Ser Pro Ala Ala Thr Pro Pro Ala Pro 
1115 1120 1125 
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Gly Ala Glu Leu Ser Tyr Leu Asn His Thr Phe Arg Leu Leu Gin 

1130 1135 1140 

Lys Pro Leu Arg Trp Lys Asp Ala Leu Leu Leu Cys Glu Ser Arg 

1145 1150 1155 

5 Asn Ala Ser Leu Ala His Val Pro Asp Pro Tyr Thr Gin Ala Phe 

1160 1165 1170 

Leu Thr Gin Ala Ala Arg Gly Leu Gin Thr Pro Leu Trp lie Gly 

1175 1180 1185 

Leu Ala Ser Glu Glu Gly Ser Arg Arg Tyr Ser Trp Leu Ser Glu 

10 1190 1195 1200 

Glu Pro Leu Asn Tyr Val Ser Trp Gin Asp Glu Glu Pro Gin His 

1205 1210 1215 

Ser Gly Gly Cys Ala Tyr Val Asp Val Asp Gly Thr Trp Arg Thr 

1220 1225 1230 

15 Thr Ser Cys Asp Thr Lys Leu Gin Gly Ala Val Cys Gly Val Ser 

1235 1240 1245 

Arg Gly Pro Pro Pro Arg Arg lie Asn Tyr Arg Gly Ser Cys Pro 

1250 1255 1260 

Gin Gly Leu Ala Asp Ser Ser Trp lie Pro Phe Arg Glu His Cys 

20 1265 1270 1275 

Tyr Ser Phe His Met Glu Val Leu Leu Gly His Lys Glu Ala Leu 

1280 1285 1290 

Gin Arg Cys Gin Lys Ala Gly Gly Thr Val Leu Ser lie Leu Asp 

1295 1300 1305 

25 Glu Met Glu Asn Val Phe Val Trp Glu His Leu Gin Thr Ala Glu 

1310 1315 1320 

Ala Gin Ser Arg Gly Ala Trp Leu Gly Met Asn Phe Asn Pro Lys 

1325 1330 1335 

Gly Gly Thr Leu Val Trp Gin Asp Asn Thr Ala Val Asn Tyr Ser 

30 1340 1345 1350 

Asn Trp Gly Pro Pro Gly Leu Gly Pro Ser Met Leu Ser His Asn 

1355 1360 1365 

Ser Cys Tyr Trp lie Gin Ser Ser Ser Gly Leu Trp Arg Pro Gly 

1370 1375 1380 

35 Ala Cys Thr Asn lie Thr Met Gly Val Val Cys Lys Leu Pro Arg 

1385 1390 1395 

Val Glu Glu Asn Ser Phe Leu Pro Ser Ala Ala Leu Pro Glu Ser 

1400 1405 1410 

Pro Val Ala Leu Val Val Val Leu Thr Ala Val Leu Leu Leu Leu 
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1415 1420 1425 

Ala Leu Met Thr Ala Ala Leu lie Leu Tyr Arg Arg Arg Gin Ser 
1430 1435 1440 

Ala Glu Arg Gly Ser Phe Glu Gly Ala Arg Tyr Ser Arg Ser Ser 
5 1445 1450 1455 

His Ser Gly Pro Ala Glu Ala Thr Glu Lys Asn lie Leu Val Ser 
1460 1465 1470 

Asp Met Glu Met Asn Glu Gin Gin Glu 
1475 1479 

10 (2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4771 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

AGCGCCGCAG GGATGGTACC CATCCGACCT GCCCTCGCGC CCTGGCCTCG 50 

TCACCTGCTG CGCTGCGTCC TGCTCCTCGG GTGC CTGC AC CTCGGCCGTC 100 

CCGGCGCCCC TGGGGACGCC GCCCTCCCGG AACCCAACAT CTTCCTCATC 15 0 

20 TTCAGCCATG GACTGCAGGG CTGCCTGGAG GCCCAGGGCG GGCAGGTCAG 200 

AGCCACCCCG GCTTGCAATA CCAGCCTCCC TGCCCAGCGC TGGAAGTGGG 2 50 

TCTCCCGAAA CCGGCTATTC AACCTGGGTA CCATGCAGTG CCTGGGCACA 3 00 

GGCTGGCCAG GCACCAACAC CACGGCCTCC CTGGGCATGT ATGAGTGTGA 3 50 

CCGGGAAGCA CTGAATCTTC GCTGGCATTG TCGTACACTG GGTGACCAGC 4 00 

25 TGTCCTTGCT CCTGGGGACC CGCACCAGCA ACATATCCAA GCCTGGCACC 4 50 

CTTGAGCGTG GTGACCAGAC CCGCAGTGGC CAGTGGCGCA TCTACGGCAG 50 0 

CGAGGAGGAC CTATGTGCTC TGCCCTACCA CGAGGTCTAC ACCATCCAGG 5 50 

GAAACTC C C A CGGAAAGCCG TGCACCATCC CCTTCAAATA TGACAACCAG 6 00 

TGGTTCCACG GCTGCACCAG CACGGGCCGC GAGGATGGTC ACCTGTGGTG 650 

30 TGCCACCACC CAGGACTACG GCAAAGACGA GCGCTGGGGC TTCTGCCCCA 700 

TCAAGAGTAA CGACTGCGAG ACCTTCTGGG ACAAGGACCA GCTGACTGAC 7 50 

AGCTGCTACC AGTTTAACTT CCAGTCCACG CTGTCGTGGA GGGAGGCCTG 800 

GGCCAGCTGC GAGCAGCAGG GTGCGGATCT GCTGAGCATC ACGGAGATCC 85 0 
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ACGAGCAGAC CTACATCAAC GGCCTCCTCA CTGGGTACAG CTCCACCCTG 900 

TGGATCGGCT TGAATGACTT GGACACGAGC GGAGGCTGGC AGTGGTCGGA 950 
CAACTCGCCC CTCAAGTACC TCAACTGGGA GAGTGACCAG CCGGACAACC 1000 
CCAGTGAGGA GAACTGTGGA GTGATCCGCA CTGAGTCCTC GGGCGGCTGG 1050 
5 CAGAACCGTG ACTGCAGCAT CGCGCTGCCC TATGTGTGCA AGAAGAAGCC 1100 
CAACGCCACG GCCGAGCCCA CCCCTCCAGA CAGGTGGGCC AATGTGAAGG 1150 
TGGAGTGCGA GCCGAGCTGG CAGCCCTTCC AGGGCCACTG CTACCGCCTG 12 0 0 
CAGGCCGAGA AGCGCAGCTG GCAGGAGTCC AAGAAGGCAT GTCTACGGGG 1250 
CGGTGGCGAC CTGGTCAGCA TCCACAGCAT GGCGGAGCTG GAATTCATCA 1300 
10 CCAAGCAGAT CAAGCAAGAG GTGGAGGAGC TGTGGATCGG CCTCAACGAT 13 50 
TTGAAGCTGC AGATGAATTT TGAGTGGTCT GACGGGAGCC TTGTGAGCTT 14 00 
CACCCACTGG CACCCCTTTG AGCCCAACAA CTTCCGGGAC AGTCTGGAGG 14 50 
ACTGTGTCAC CATCTGGGGC CCGGAAGGCC G CTGGAACG A CAGTCCCTGT 1500 
AACCAGTCCT TGCCATCCAT CTGCAAGAAG GCAGGCCAGC TGAGCCAGGG 155 0 
15 GGCCGCCGAG GAGGACCATG GCTGCCGGAA GGGTTGGACG TGGCACAGCC 1600 
CATCCTGCTA CTGGCTGGGA GAAGACCAAG TGACCTACAG TGAGGCCCGG 16 50 
CGCCTGTGCA CTGACCATGG CTCTCAGCTG GTCACCATCA CCAACAGGTT 170 0 
CGAGCAGGCC TTCGTCAGCA GCCTCATCTA CAACTGGGAG GGCGAGTACT 1750 
TCTGGACGGC CCTGCAGGAC CTCAACAGCA CCGGCTCCTT CTTCTGGCTC 180 0 
10 AGTGGGGATG AAGTCATGTA CACCCACTGG AACCGGGACC AGCCCGGGTA 1850 
CAGCCGTGGG GGCTGCGTGG CGCTGGCCAC TGGCAGCGCC ATGGGGCTGT 19 00 
GGGAGGTGAA GAACTGTACC TCGTTCCGGG CCCGCTACAT CTGCCGGCAG 195 0 
AGCCTGGGCA CTCCAGTGAC GCCGGAGCTG CCGGGGCCAG ATCCCACGCC 2000 
CAGCCTCACT GGCTCCTGTC CCCAGGGCTG GGCCTCTGAC ACCAAACTCC 2050 
:5 GGTATTGCTA TAAGGTGTTC AGCTCAGAGC GGCTGCAGGA CAAGAAGAGC 2100 
TGGGTCCAGG CCCAGGGGGC CTGCCAGGAG CTGGGGGCCC AGCTGCTGAG 2150 
CCTGGCCAGC TACGAGGAGG AGCACTTTGT GGCCAACATG CTCAACAAGA 22 00 
TCTTCGGTGA ATCAGAACCC GAGATCCACG AGCAGCACTG GTTCTGGGTC 2250 
GGCCTGAACC GTCGGGATCC CAGAGGGGGT CAGAGTTGGC GCAGGAGCGA 23 00 
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CGGCGTAGGG TTCTCTTACC ACAATTTCGA CCGGAGCCGG CACGACGACG 2350 
ACGACATCCG AGGCTGTGCG GTGCTGGACC TGGCCTCCCT GCAGTGGGTG 24 00 
GTCATGCAGT GCGACACACA GCTGGACTGG ATCTGCAAGA TCCCCAGAGG 24 50 
TACGGACGTG CGAGAGCCCG ACGACAGCCC TCAAGGCCGA CGGGAATGGC 2 500 
5 TGCGCTTCCA GGAGGCCGAG TACAAGTTCT TTGAGCACCA CTCCACGTGG 2 550 
GCGCAGGCGC AGCGCATCTG CACGTGGTTC CAGGCCGAGC TGACCTCCGT 26 00 
GCACAGCCAG GCGGAGCTAG ACTTCCTGAG CCACAACTTG CAGAAGTTCT 2650 
CCCGGGCCCA GGAGCAGCAC TGGTGGATCG GCCTGCACAC CTCTGAGAGC 2 700 
GATGGGCGCT TCAGATGGAC AGATGGTTCC ATTATAAACT TCATCTCCTG 27 50 
10 GGCACCAGGC AAACCTCGGC CTGTCGGCAA GGACAAGAAG TGCGTGTACA 2 8 00 
TGACAGCCAG CCGAGAGGAC TGGGGGGACC AGAGGTGCCT GACAGCCTTG 2 85 0 
CCCTACATCT GCAAGCGCAG CAACGTCACC AAAGAAACGC AGCCCCCAGT 2 900 
CCTGCCAACT ACAGCCCTGG GGGGCTGCCC CTCTGACTGG ATCCAGTTCC 2 950 
TCAACAAGTG TTTTCAGGTC CAGGGCCAGG AACCCCAGAG CCGGGTGAAG 3 000 
15 TGGTCAGAGG CACAGTTCTC CTGTGAACAG CAAGAGGCCC AGCTGGTCAC 3 0 50 
CATCACAAAC CCCTTAGAGC AAGCATTCAT CACAGCCAGC CTGCCCAATG 3100 
TG AC CTTTGA CCTTTGGATT GGCCTCCATG CCTCGCAGAG GGACTCCCAG 3150 
TGGGTGGAGC AGGAGCCTTT GATGTATGCC AACTGGGCAC CTGGGGAGCC 32 00 
CTTTGGC CCT AGCCCTGCTC CCAGTGGCAA CAAACCGACC AGCTGTGCGG 32 50 
20 TGGTCCTGCA CAGCCCCTCA GCCCACTTCA CTGGCCGCTG GGACGATCGG 3 3 00 
AGCTGCACGG AGGAGACCCA TGGCTTCATC TGCCAGAAGG GCACGGACCC 3350 
CTCCCTGAGC CCGTCCCCAG CAGCGCTGCC CCCCGCCCCG GGCACTGAGC 340 0 
TCTCCTACCT CAACGGCACC TTCCGGCTGC TTCAGAAGCC GCTGCGCTGG 34 50 
CACGATGCCC TCCTGCTGTG TGAGAGC CAC AATGCCAGCC TGGCCTACGT 3 500 
25 GCCCGACCCC TACACCCAGG CCTTCCTCAC GCAGGCTGCC CGAGGGCTGC 3550 
GCACGCCGCC CTGGATTGGG CTGGCTGGCG AGGAGGGCTC TCGGCGGTAC 3 600 
TCCTGGGTCT CAGAGGAGCC GCTGAACTAC GTGGGCTGGC AGGACGGGGA 36 50 
GCCGCAGCAG CCGGGGGGCT GTACCTACGT AGATGTGGAC GGGGCCTGGC 3700 
GCACCACCAG CTGTGACACC AAGCTGCAGG GGGCTGTGTG TGGGGTTAGC 3 7 50 
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AGTGGGCCCC CTCCTCCCCG AAGAATAAGC TACCATGGCA GCTGTCCCCA 3 800 

GGGACTGGCA GACTCCGCGT GGATTCCCTT CCGGGAGCAC TGCTATTCTT 3850 
TCCACATGGA GCTGCTGCTG GGCCACAAGG AGGCGCGACA GCGCTGCCAG 3 900 
AGAGCGGGTG GGGCCGTCCT GTCTATCCTG GATGAGATGG AGAATGTGTT 3 95 0 
5 TGTCTGGGAG CACCTGCAGA GCTATGAGGG CCAGAGT CGG GGCGCCTGGC 4000 
TGGGCATGAA CTTCAACCCC AAAGGAGGCA CTCTGGTCTG GCAGGACAAC 4050 
ACAGCTGTGA ACTACTCCAA CTGGGGGCCC CCGGGCTTGG GCCCCAGCAT 4100 
GCTGAGCCAC AACAGCTGCT ACTGGATTCA GAGCAACAGC GGGCTATGGC 4150 
GCCCCGGCGC TTGCACCAAC ATCAC CATGG GTGTCGTCTG CAAGCTTCCT 42 0 0 

10 CGTGCTGAGC GGAGCAGCTT CTCCCCATCA GCGCTTCCAG AGAACCCAGC 42 50 
GGCCCTGGTG GTGGTGCTGA TGGCGGTGCT GCTGCTCCTG GCCTTGCTGA 43 00 
CCGCAGCCCT CATCCTTTAC CGGAGGCGCC AGAGCATCGA GCGCGGGGCC 4 350 
TTTGAGGGTG CCCGCTACAG CCGCAGCAGC TCCAGCCCCA CCGAGGCCAC 44 0 0 
CGAGAAGAAC ATCCTGGTGT CAGACATGGA AATGAATGAG CAGCAAGAAT 44 50 

15 AGAGCCAGGC GCGTGGGCAG GGCCAGGGCG GGAGGAGCTG GGGAGCTGGG 4500 
GCCCTGGGTC AGTCTGGCCC CCCACCAGCT GCCTGTCCAG TTGGCCTATG 4 550 
GAAGGGTGCC CTTGGGAGTC GCTGTTGGGA GCCGGAGCTG GGCAGAGCCT 4600 
GGGCTGGTGG GGGCCGGAAT TCGCCCTATA GTGAGTCGTA TTACAATTCA 4 65 0 
CTGGCCGTCG TTTTACAACG TCGTGACTGG GAAAACCTGG CGTTACCAAC 4 70 0 

20 TTAATCGCCT TGCAGCACAT CCCCCTTTCG CCAGCTGGCG TAATAGCGAA 47 50 

GAGGCCGCAC CGATCGCCTT C 4 771 

(2) " INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 79 amino acids 
25 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Val Pro lie Arg Pro Ala Leu Ala Pro Trp Pro Arg His Leu 
1 5 io 15 

30 Leu Arg Cys Val Leu Leu Leu Gly Cys Leu His Leu Gly Arg Pro 

20 25 30 

Gly Ala Pro Gly Asp Ala Ala Leu Pro Glu Pro Asn lie Phe Leu 
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lie Phe Ser His Gly Leu Gin Gly Cys Leu Glu Ala Gin Gly Gly 
50 55 60 

Gin Val Arg Ala Thr Pro Ala Cys Asn Thr Ser Leu Pro Ala Gin 
5 65 70 75 

Arg Trp Lys Trp Val Ser Arg Asn Arg Leu Phe Asn Leu Gly Thr 
80 85 90 

Met Gin Cys Leu Gly Thr Gly Trp Pro Gly Thr Asn Thr Thr Ala 
95 100 105 

Ser Leu Gly Met Tyr Glu Cys Asp Arg Glu Ala Leu Asn Leu Arg 
110 115 120 

Trp His Cys Arg Thr Leu Gly Asp Gin Leu Ser Leu Leu Leu Gly 
125 130 135 

Thr Arg Thr Ser Asn lie Ser Lys Pro Gly Thr Leu Glu Arg Gly 
140 145 150 

Asp Gin Thr Arg Ser Gly Gin Trp Arg lie Tyr Gly Ser Glu Glu 
155 160 165 

Asp Leu Cys Ala Leu Pro Tyr His Glu Val Tyr Thr He Gin Gly 
170 175 180 

Asn Ser His Gly Lys Pro Cys Thr He Pro Phe Lys Tyr Asp Asn 
185 190 195 

Gin Trp Phe His Gly Cys Thr Ser Thr Gly Arg Glu Asp Gly His 
200 205 210 

Leu Trp Cys Ala Thr Thr Gin Asp Tyr Gly Lys Asp Glu Arg Trp 
215 220 225 

Gly Phe Cys Pro He Lys Ser Asn Asp Cys Glu Thr Phe Trp Asp 
230 235 240 

Lys Asp Gin Leu Thr Asp Ser Cys Tyr Gin Phe Asn Phe Gin Ser 
245 250 255 

t Thr Leu Ser Trp Arg Glu Ala Trp Ala Ser Cys Glu Gin Gin Gly 

260 265 270 

Ala Asp Leu Leu Ser lie Thr Glu He His Glu Gin Thr Tyr He 
275 280 285 

Asn Gly Leu Leu Thr Gly Tyr Ser Ser Thr Leu Trp He Gly Leu 
! 290 295 300 

Asn Asp Leu Asp Thr Ser Gly Gly Trp Gin Trp Ser Asp Asn Ser 
305 310 315 

Pro Leu Lys Tyr Leu Asn Trp Glu Ser Asp Gin Pro Asp Asn Pro 
320 325 330 
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Ser Glu Glu Asn Cys Gly Val lie Arg Thr Glu Ser Ser Gly Gly 
335 340 345 

Trp Gin Asn Arg Asp Cys Ser lie Ala Leu Pro Tyr Val Cys Lys 
350 355 360 

5 Lys Lys Pro Asn Ala Thr Ala Glu Pro Thr Pro Pro Asp Arg Trp 

365 370 375 

Ala Asn Val Lys Val Glu Cys Glu Pro Ser Trp Gin Pro Phe Gin 
380 385 390 

Gly His Cys Tyr Arg Leu Gin Ala Glu Lys Arg Ser Trp Gin Glu 
10 395 400 405 

Ser Lys Lys Ala Cys Leu Arg Gly Gly Gly Asp Leu Val Ser lie 
410 415 420 

His Ser Met Ala Glu Leu Glu Phe lie Thr Lys Gin lie Lys Gin 
425 430 435 

15 Glu Val Glu Glu Leu Trp He Gly Leu Asn Asp Leu Lys Leu Gin 

440 445 450 

Met Asn Phe Glu Trp Ser Asp Gly Ser Leu Val Ser Phe Thr His 
455 460 465 

Trp His Pro Phe Glu Pro Asn Asn Phe Arg Asp Ser Leu Glu Asp 
20 470 475 480 

Cys Val Thr He Trp Gly Pro Glu Gly Arg Trp Asn Asp Ser Pro 
485 490 495 

Cys Asn Gin Ser Leu Pro Ser He Cys Lys Lys Ala Gly Gin Leu 
500 505 510 

25 Ser Gin Gly Ala Ala Glu Glu Asp His Gly Cys Arg Lys Gly Trp 

515 520 525 

Thr Trp His Ser Pro Ser Cys Tyr Trp Leu Gly Glu Asp Gin Val 
530 535 540 

Thr Tyr Ser Glu Ala Arg Arg Leu Cys Thr Asp His Gly Ser Gin 
30 545 550 555 

Leu Val Thr He Thr Asn Arg Phe Glu Gin Ala Phe Val Ser Ser 
560 565 570 

Leu He Tyr Asn Trp Glu Gly Glu Tyr Phe Trp Thr Ala Leu Gin 
575 580 585 

35 Asp Leu Asn Ser Thr Gly Ser Phe Phe Trp Leu Ser Gly Asp Glu 

590 595 600 

Val Met Tyr Thr His Trp Asn Arg Asp Gin Pro Gly Tyr Ser Arg 
605 610 615 

Gly Gly Cys Val Ala Leu Ala Thr Gly Ser Ala Met Gly Leu Trp 
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Glu Val Lys Asn Cys Thr Ser Phe Arg Ala Arg Tyr lie Cys Arg 
635 640 645 

Gin Ser Leu Gly Thr Pro Val Thr Pro Glu Leu Pro Gly Pro Asp 
5 650 655 660 

Pro Thr Pro Ser Leu Thr Gly Ser Cys Pro Gin Gly Trp Ala Ser 
665 670 675 

Asp Thr Lys Leu Arg Tyr Cys Tyr Lys Val Phe Ser Ser Glu Arg 
680 685 690 

10 Leu Gin Asp Lys Lys Ser Trp Val Gin Ala Gin Gly Ala Cys Gin 

695 700 705 

Glu Leu Gly Ala Gin Leu Leu Ser Leu Ala Ser Tyr Glu Glu Glu 
710 715 720 

His Phe Val Ala Asn Met Leu Asn Lys lie Phe Gly Glu Ser Glu 
15 725 730 735 

Pro Glu lie His Glu Gin His Trp Phe Trp Val Gly Leu Asn Arg 
740 745 750 

Arg Asp Pro Arg Gly Gly Gin Ser Trp Arg Arg Ser Asp Gly Val 
755 760 765 

20 Gly Phe Ser Tyr His Asn Phe Asp Arg Ser Arg His Asp Asp Asp 

770 775 780 

Asp lie Arg Gly Cys Ala Val Leu Asp Leu Ala Ser Leu Gin Trp 
785 790 795 

Val Val Met Gin Cys Asp Thr Gin Leu Asp Trp lie Cys Lys lie 
25 800 805 810 

Pro Arg Gly Thr Asp Val Arg Glu Pro Asp Asp Ser Pro Gin Gly 
815 820 825 

Arg Arg Glu Trp Leu Arg Phe Gin Glu Ala Glu Tyr Lys Phe Phe 
830 835 840 

30 Glu His His Ser Thr Trp Ala Gin Ala Gin Arg lie Cys Thr Trp 

845 850 855 

Phe Gin Ala Glu Leu Thr Ser Val His Ser Gin Ala Glu Leu Asp 
860 865 870 

Phe Leu Ser His Asn Leu Gin Lys Phe Ser Arg Ala Gin Glu Gin 
35 875 880 885 

His Trp Trp lie Gly Leu His Thr Ser Glu Ser Asp Gly Arg Phe 
890 895 900 

Arg Trp Thr Asp Gly Ser lie He Asn Phe He Ser Trp Ala Pro 
905 910 915 
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Gly Lys Pro Arg Pro Val Gly Lys Asp Lys Lys Cys Val Tyr Met 
920 925 930 

Thr Ala Ser Arg Glu Asp Trp Gly Asp Gin Arg Cys Leu Thr Ala 
935 940 945 

5 Leu Pro Tyr lie Cys Lys Arg Ser Asn Val Thr Lys Glu Thr Gin 

950 955 960 

Pro Pro Val Leu Pro Thr Thr Ala Leu Gly Gly Cys Pro Ser Asp 
965 970 975 

Trp lie Gin Phe Leu Asn Lys Cys Phe Gin Val Gin Gly Gin Glu 
10 980 985 990 

Pro Gin Ser Arg Val Lys Trp Ser Glu Ala Gin Phe Ser Cys Glu 
995 1000 1005 

Gin Gin Glu Ala Gin Leu Val Thr lie Thr Asn Pro Leu Glu Gin 
1010 1015 1020 

15 Ala Phe lie Thr Ala Ser Leu Pro Asn Val Thr Phe Asp Leu Trp 

1025 1030 1035 

lie Gly Leu His Ala Ser Gin Arg Asp Ser Gin Trp Val Glu Gin 
1040 1045 1050 

Glu Pro Leu Met Tyr Ala Asn Trp Ala Pro Gly Glu Pro Phe Gly 
20 1055 1060 1065 

Pro Ser Pro Ala Pro Ser Gly Asn Lys Pro Thr Ser Cys Ala Val 
1070 1075 1080 

Val Leu His Ser Pro Ser Ala His Phe Thr Gly Arg Trp Asp Asp 
1085 1090 1095 

25 Arg Ser Cys Thr Glu Glu Thr His Gly Phe lie Cys Gin Lys Gly 

1100 1105 mo 

Thr Asp Pro Ser Leu Ser Pro Ser Pro Ala Ala Leu Pro Pro Ala 
1115 1120 1125 

Pro" Gly Thr Glu Leu Ser Tyr Leu Asn Gly Thr Phe Arg Leu Leu 
30 H30 1135 1140 

Gin Lys Pro Leu Arg Trp His Asp Ala Leu Leu Leu Cys Glu Ser 
1145 H50 1155 

His Asn Ala Ser Leu Ala Tyr Val Pro Asp Pro Tyr Thr Gin Ala 
1160 H65 H70 

35 Phe Leu Thr Gin Ala Ala Arg Gly Leu Arg Thr Pro Pro Trp lie 

1175 1180 H85 

Gly Leu Ala Gly Glu Glu Gly Ser Arg Arg Tyr Ser Trp Val Ser 
1190 H95 1200 

Glu Glu Pro Leu Asn Tyr Val Gly Trp Gin Asp Gly Glu Pro Gin 
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Gin Pro Gly Gly Cys Thr Tyr Val Asp Val Asp Gly Ala Trp Arg 
1220 1225 1230 

Thr Thr Ser Cys Asp Thr Lys Leu Gin Gly Ala Val Cys Gly Val 
5 1235 1240 1245 

Ser Ser Gly Pro Pro Pro Pro Arg Arg lie Ser Tyr His Gly Ser 
1250 1255 1260 

Cys Pro Gin Gly Leu Ala Asp Ser Ala Trp lie Pro Phe Arg Glu 
1265 1270 1275 

10 His Cys Tyr Ser Phe His Met Glu Leu Leu Leu Gly His Lys Glu 

1280 1285 1290 

Ala Arg Gin Arg Cys Gin Arg Ala Gly Gly Ala Val Leu Ser lie 
1295 1300 1305 

Leu Asp Glu Met Glu Asn Val Phe Val Trp Glu His Leu Gin Ser 
15 ~ 1310 1315 1320 

Tyr Glu Gly Gin Ser Arg Gly Ala Trp Leu Gly Met Asn Phe Asn 
1325 1330 1335 

Pro Lys Gly Gly Thr Leu Val Trp Gin Asp Asn Thr Ala Val Asn 
1340 1345 1350 

20 Tyr Ser Asn Trp Gly Pro Pro Gly Leu Gly Pro Ser Met Leu Ser 

1355 1360 1365 

His Asn Ser Cys Tyr Trp lie Gin Ser Asn Ser Gly Leu Trp Arg 
1370 1375 1380 

Pro Gly Ala Cys Thr Asn lie Thr Met Gly Val Val Cys Lys Leu 
25 1385 1390 1395 

Pro Arg Ala Glu Arg Ser Ser Phe Ser Pro Ser Ala Leu Pro Glu 
1400 1405 1410 

Asn Pro Ala Ala Leu Val Val Val Leu Met Ala Val Leu Leu Leu 
1415 1420 1425 

30 Leu Ala Leu Leu Thr Ala Ala Leu lie Leu Tyr Arg Arg Arg Gin 

1430 1435 1440 

Ser lie Glu Arg Gly Ala Phe Glu Gly Ala Arg Tyr Ser Arg Ser 
1445 1450 1455 

Ser Ser Ser Pro Thr Glu Ala Thr Glu Lys Asn lie Leu Val Ser 
35 1460 1465 1470 

Asp Met Glu Met Asn Glu Gin Gin Glu 
1475 1479 

(2) INFORMATION FOR SEQ ID NO : 5 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1455 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

Met Arg Leu Leu Leu Leu Leu Ala Phe lie Ser Val lie Pro Val 
15 10 15 

Ser Val Gin Leu Leu Asp Ala Arg Gin Phe Leu lie Tyr Asn Glu 
20 25 30 

10 Asp His Lys Arg Cys Val Asp Ala Leu Ser Ala lie Ser Val Gin 

35 40 45 

Thr Ala Thr Cys Asn Pro Glu Ala Glu Ser Gin Lys Phe Arg Trp 
50 55 60 

Val Ser Asp Ser Gin lie Met Ser Val Ala Phe Lys Leu Cys Leu 
15 65 70 75 

Gly Val Pro Ser Lys Thr Asp Trp Ala Ser Val Thr Leu Tyr Ala 
80 85 90 

Cys Asp Ser Lys Ser Glu Tyr Gin Lys Trp Glu Cys Lys Asn Asp 
95 100 105 

20 Thr Leu Phe Gly lie Lys Gly Thr Glu Leu Tyr Phe Asn Tyr Gly 

110 US 120 

Asn Arg Gin Glu Lys Asn lie Lys Leu Tyr Lys Gly Ser Gly Leu 
125 130 135 

Trp Ser Arg Trp Lys Val Tyr Gly Thr Thr Asp Asp Leu Cys Ser 
25 140 145 150 

Arg Gly Tyr Glu Ala Met Tyr Ser Leu Leu Gly Asn Ala Asn Gly 
155 160 165 

Ala Val Cys Ala Phe Pro Phe Lys Phe Glu Asn Lys Trp Tyr Ala 
170 175 180 

30 Asp Cys Thr Ser Ala Gly Arg Ser Asp Gly Trp Leu Trp Cys Gly 

185 190 195 

Thr Thr Thr Asp Tyr Asp Lys Asp Lys Leu Phe Gly Phe Cys Pro 
200 205 210 

Leu His Phe Glu Gly Ser Glu Arg Leu Trp Asn Lys Asp Pro Leu 
35 215 220 225 

Thr Gly lie Leu Tyr Gin lie Asn Ser Lys Ser Ala Leu Thr Trp 
230 235 240 

His Gin Ala Arg Ala Ser Cys Lys Gin Gin Asn Ala Asp Leu Leu 
245 250 255 
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Ser Val Thr Glu lie His Glu Gin Met Tyr Leu Thr Gly Leu Thr 
260 265 270 

Ser Ser Leu Ser Ser Gly Leu Trp He Gly Leu Asn Ser Leu Ser 
275 280 285 

5 Val Arg Ser Gly Trp Gin Trp Ala Gly Gly Ser Pro Phe Arg Tyr 

290 295 300 

Leu Asn Leu Pro Gly Ser Pro Ser Ser Glu Pro Gly Lys Ser Cys 
305 310 315 

Val Ser Leu Asn Pro Gly Lys Asn Ala Lys Trp Glu Asn Leu Glu 
10 320 325 330 

Cys Val Gin Lys Leu Gly Tyr He Cys Lys Lys Gly Asn Asn Thr 
335 340 345 

Leu Asn Pro Phe He He Pro Ser Ala Ser Asp Val Pro Thr Gly 
350 355 360 

15 Cys Pro Asn Gin Trp Trp Pro Tyr Ala Gly His Cys Tyr Arg He 

365 370 375 

His Arg Glu Glu Lys Lys He Gin Lys Tyr Ala Leu Gin Ala Cys 
380 385 390 

Arg Lys Glu Gly Gly Asp Leu Ala Ser He His Ser He Glu Glu 
20 395 400 405 

Phe Asp Phe He Phe Ser Gin Leu Gly Tyr Glu Pro Asn Asp Glu 
410 415 420 

Leu Trp He Gly Leu Asn Asp He Lys He Gin Met Tyr Phe Glu 
425 430 435 

25 Trp Ser Asp Gly Thr Pro Val Thr Phe Thr Lys Trp Leu Pro Gly 

440 445 450 

Glu Pro Ser His Glu Asn Asn Arg Gin Glu Asp Cys Val Val Met 
455 460 465 

Lys Gly Lys Asp Gly Tyr Trp Ala Asp Arg Ala Cys Glu Gin Pro 
30 ^ * ' 470 475 480 

Leu Gly Tyr He Cys Lys Met Val Ser Gin Ser His Ala val Val 
485 490 495 

Pro Glu Gly Ala Asp Lys Gly Cys Arg Lys Gly Trp Lys Arg His 
500 505 510 

35 Gly Phe Tyr Cys Tyr Leu He Gly Ser Thr Leu Ser Thr Phe Thr 

515 520 525 

Asp Ala Asn His Thr Cys Thr Asn Glu Lys Ala Tyr Leu Thr Thr 
530 535 540 

Val Glu Asp Arg Tyr Glu Gin Ala Phe Leu Thr Ser Leu Val Gly 
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Leu Arg Pro Glu Lys Tyr Phe Trp Thr Gly Leu Ser Asp Val Gin 
560 565 570 

Asn Lys Gly Thr Phe Arg Trp Thr Val Asp Glu Gin Val Gin Phe 
5 575 580 585 

Thr His Trp Asn Ala Asp Met Pro Gly Arg Lys Ala Gly Cys Val 
590 595 600 

Ala Met Lys Thr Gly Val Ala Gly Gly Leu Trp Asp Val Leu Ser 
605 610 615 

10 Cys Glu Glu Lys Ala Lys Phe Val Cys Lys His Trp Ala Glu Gly 

620 625 630 

Val Thr Arg Pro Pro Glu Pro Thr Thr Thr Pro Glu Pro Lys Cys 
635 640 645 

Pro Glu Asn Trp Gly Thr Thr Ser Lys Thr Ser Met Cys Phe Lys 
15 650 655 660 

Leu Tyr Ala Lys Gly Lys His Glu Lys Lys Thr Trp Phe Glu Ser 
665 670 675 

Arg Asp Phe Cys Lys Ala lie Gly Gly Glu Leu Ala Ser lie Lys 
680 685 690 

20 Ser Lys Asp Glu Gin Gin Val lie Trp Arg Leu lie Thr Ser Ser 

695 700 705 

Gly Ser Tyr His Glu Leu Phe Trp Leu Gly Leu Thr Tyr Gly Ser 
710 715 720 

Pro Ser Glu Gly Phe Thr Trp Ser Asp Gly Ser Pro Val Ser Tyr 
25 725 730 735 

Glu Asn Trp Ala Tyr Gly Glu Pro Asn Asn Tyr Gin Asn Val Glu 
740 745 750 

Tyr Cys Gly Glu Leu Lys Gly Asp Pro Gly Met Ser Trp Asn Asp 
755 760 765 

30 He Asn Cys Glu His Leu Asn Asn Trp He Cys Gin He Gin Lys 

770 775 780 

Gly Lys Thr Leu Leu Pro Glu Pro Thr Pro Ala Pro Gin Asp Asn 
785 790 795 

Pro Pro Val Thr Ala Asp Gly Trp Val He Tyr Lys Asp Tyr Gin 
35 800 805 810 

Tyr Tyr Phe Ser Lys Glu Lys Glu Thr Met Asp Asn Ala Arg Arg 
815 820 825 

Phe Cys Lys Lys Asn Phe Gly Asp Leu Ala Thr He Lys Ser Glu 
830 835 840 
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Ser Glu Lys Lys Phe Leu Trp Lys Tyr lie Asn Lys Asn Gly Gly 
845 850 855 

Gin Ser Pro Tyr Phe lie Gly Met Leu lie Ser Met Asp Lys Lys 
860 865 870 

5 Phe lie Trp Met Asp Gly Ser Lys Val Asp Phe Val Ala Trp Ala 

875 880 885 

Thr Gly Glu Pro Asn Phe Ala Asn Asp Asp Glu Asn Cys Val Thr 
890 895 900 

Met Tyr Thr Asn Ser Gly Phe Trp Asn Asp lie Asn Cys Gly Tyr 
10 905 910 915 

Pro Asn Asn Phe lie Cys Gin Arg His Asn Ser Ser lie Asn Ala 
920 925 930 

Thr Ala Met Pro Thr Thr Pro Thr Thr Pro Gly Gly Cys Lys Glu 
935 940 945 

15 Gly Trp His Leu Tyr Lys Asn Lys Cys Phe Lys lie Phe Gly Phe 

950 955 960 

Ala Asn Glu Glu Lys Lys Ser Trp Gin Asp Ala Arg Gin Ala Cys 
965 970 975 

Lys Gly Leu Lys Gly Asn Leu Val Ser lie Glu Asn Ala Gin Glu 
20 980 985 990 

Gin Ala Phe Val Thr Tyr His Met Arg Asp Ser Thr Phe Asn Ala 
995 1000 1005 

Trp Thr Gly Leu Asn Asp lie Asn Ala Glu His Met Phe Leu Trp 
1010 1015 1020 

25 Thr Ala Gly Gin Gly Val His Tyr Thr Asn Trp Gly Lys Gly Tyr 

1025 1030 1035 

Pro Gly Gly Arg Arg Ser Ser Leu Ser Tyr Glu Asp Ala Asp Cys 
1040 1045 1050 

Val Val Val lie Gly Gly Asn Ser Arg Glu Ala Gly Thr Trp Met 
30 1055 1060 1065 

Asp Asp Thr Cys Asp Ser Lys Gin Gly Tyr lie Cys Gin Thr Gin 
1070 1075 1080 

Thr Asp Pro Ser Leu Pro Val Ser Pro Thr Thr Thr Pro Lys Asp 
1085 1090 1095 

35 Gly Phe Val Thr Tyr Gly Lys Ser Ser Tyr Ser Leu Met Lys Leu 

1100 1105 1110 

Lys Leu Pro Trp His Glu Ala Gly Thr Tyr Cys Lys Asp His Thr 
1115 1120 1125 

Ser Leu Leu Ala Ser lie Leu Asp Pro Tyr Ser Asn Ala Phe Ala 
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Trp Met Lys Met His Pro Phe Asn Val Pro lie Trp lie Ala Leu 
1145 1150 1155 

Asn Ser Asn Leu Thr Asn Asn Glu Tyr Thr Trp Thr Asp Arg Trp 
5 1160 1165 1170 

Arg Val Arg Tyr Thr Asn Trp Gly Ala Asp Glu Pro Lys Leu Lys 
1175 1180 1185 

Ser Ala Cys Val Tyr Met Asp Val Asp Gly Tyr Trp Arg Thr Ser 
1190 1195 1200 

10 Tyr Cys Asn Glu Ser Phe Tyr Phe Leu Cys Lys Lys Ser Asp Glu 

1205 1210 1215 

lie Pro Ala Thr Glu Pro Pro Gin Leu Pro Gly Lys Cys Pro Glu 
1220 1225 1230 

Ser Glu Gin Thr Ala Trp lie Pro Phe Tyr Gly His Cys Tyr Tyr 
15 1235 1240 1245 

Phe Glu Ser Ser Phe Thr Arg Ser Trp Gly Gin Ala Ser Leu Glu 
1250 1255 1260 

Cys Leu Arg Met Gly Ala Ser Leu Val Ser lie Glu Thr Ala Ala 
1265 1270 1275 

20 Glu Ser Ser Phe Leu Ser Tyr Arg Val Glu Pro Leu Lys Ser Lys 

1280 1285 1290 

Thr Asn Phe Trp lie Gly Met Phe Arg Asn Val Glu Gly Lys Trp 
1295 1300 1305 

Leu Trp Leu Asn Asp Asn Pro Val Ser Phe Val Asn Trp Lys Thr 
25 1310 1315 1320 

Gly Asp Pro Ser Gly Glu Arg Asn Asp Cys Val Val Leu Ala Ser 
1325 1330 1335 

Ser Ser Gly Leu Trp Asn Asn lie His Cys Ser Ser Tyr Lys Gly 
1340 1345 1350 

30 Phe lie Cys Lys Met Pro Lys lie lie Asp Pro Val Thr Thr His 

1355 1360 1365 

Ser Ser He Thr Thr Lys Ala Asp Gin Arg Lys Met Asp Pro Gin 
1370 1375 1380 

Pro Lys Gly Ser Ser Lys Ala Ala Gly Val Val Thr Val Val Leu 
35 1385 1390 1395 

Leu He Val He Gly Ala Gly Val Ala Ala Tyr Phe Phe Tyr Lys 
1400 1405 1410 

Lys Arg His Ala Leu His He Pro Gin Glu Ala Thr Phe Glu Asn 
1415 1420 1425 
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Thr Leu Tyr Phe Asn Ser Asn Leu Ser Pro Gly Thr Ser Asp Thr 
1430 1435 1440 

Lys Asp Leu Met Gly Asn lie Glu Gin Asn Glu His Ala He He 
1445 1450 1455 

5 (2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1449 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

10 <xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Arg Thr Gly Arg Val Thr Pro Gly Leu Ala Ala Gly Leu Leu 
1 5 10 15 

Leu Leu Leu Leu Arg Ser Phe Gly Leu Val Glu Pro Ser Glu Ser 
20 25 30 

15 Ser Gly Asn Asp Pro Phe Thr He Val His Glu Asn Thr Gly Lys 

35 40 45 

Cys He Gin Pro Leu Ser Asp Trp Val Val Ala Gin Asp Cys Ser 
50 55 60 

Gly Thr Asn Asn Met Leu Trp Lys Trp Val Ser Gin His Arg Leu 
20 65 70 75 

Phe His Leu Glu Ser Gin Lys Cys Leu Gly Leu Asp He Thr Lys 
80 85 90 

Ala Thr Asp Asn Leu Arg Met Phe Ser Cys Asp Ser Thr Val Met 
95 100 105 

25 Leu Trp Trp Lys Cys Glu His His Ser Leu Tyr Thr Ala Ala Gin 

110 H5 120 

Tyr Arg Leu Ala Leu Lys Asp Gly Tyr Ala Val Ala Asn Thr Asn 
125 130 135 

Thr Ser Asp Val Trp Lys Lys Gly Gly Ser Glu Glu Asn Leu Cys 
30 140 145 150 

Ala Gin Pro Tyr His Glu He Tyr Thr Arg Asp Gly Asn Ser Tyr 
155 160 165 

Gly Arg Pro Cys Glu Phe Pro Phe Leu He Gly Glu Thr Trp Tyr 
170 175 180 

35 His Asp Cys He His Asp Glu Asp His Ser Gly Pro Trp Cys Ala 

185 190 195 

Thr Thr Leu Ser Tyr Glu Tyr Asp Gin Lys Trp Gly He Cys Leu 
200 205 210 

Leu Pro Glu Ser Gly Cys Glu Gly Asn Trp Glu Lys Asn Glu Gin 

-62- 



WO 97/40154 



215 



220 



PCT/US97/06347 

225 



lie Gly Ser Cys Tyr Gin Phe Asn Asn Gin Glu lie Leu Ser Trp 
230 235 240 

Lys Glu Ala Tyr Val Ser Cys Gin Asn Gin Gly Ala Asp Leu Leu 
5 245 250 255 

Ser lie His Ser Ala Ala Glu Leu Ala Tyr lie Thr Gly Lys Glu 
260 265 270 

Asp lie Ala Arg Leu Val Trp Leu Gly Leu Asn Gin Leu Tyr Ser 
275 280 285 

10 Ala Arg Gly Trp Glu Trp Ser Asp Phe Arg Pro Leu Lys Phe Leu 

290 295 300 

Asn Trp Asp Pro Gly Thr Pro Val Ala Pro Val lie Gly Gly Ser 
305 310 315 

Ser Cys Ala Arg Met Asp Thr Glu Ser Gly Leu Trp Gin Ser Val 
15 320 325 330 

Ser Cys Glu Ser Gin Gin Pro Tyr Val Cys Lys Lys Pro Leu Asn 
335 340 345 

Asn Thr Leu Glu Leu Pro Asp Val Trp Thr Tyr Thr Asp Thr His 
350 355 360 

20 Cys His Val Gly Trp Leu Pro Asn Asn Gly Phe Cys Tyr Leu Leu 

365 370 375 

Ala Asn Glu Ser Ser Ser Trp Asp Ala Ala His Leu Lys Cys Lys 
380 385 390 

Ala Phe Gly Ala Asp Leu lie Ser Met His Ser Leu Ala Asp Val 
25 395 400 405 

Glu Val Val Val Thr Lys Leu His Asn Gly Asp Val Lys Lys Glu 
410 415 420 

lie Trp Thr Gly Leu Lys Asn Thr Asn Ser Pro Ala Leu Phe Gin 
425 430 435 

30 Trp Ser Asp Gly Thr Glu Val Thr Leu Thr Tyr Trp Asn Glu Asn 

440 445 450 

Glu Pro Ser Val Pro Phe Asn Lys Thr Pro Asn Cys Val Ser Tyr 
455 460 465 

Leu Gly Lys Leu Gly Gin Trp Lys Val Gin Ser Cys Glu Lys Lys 
35 470 475 480 

Leu Arg Tyr Val Cys Lys Lys Lys Gly Glu lie Thr Lys Asp Ala 
485 490 495 

Glu Ser Asp Lys Leu Cys Pro Pro Asp Glu Gly Trp Lys Arg His 
500 505 510 
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Gly Glu Thr Cys Tyr Lys lie Tyr Glu Lys Glu Ala Pro Phe Gly 
515 520 525 

Thr Asn Cys Asn Leu Thr lie Thr Ser Arg Phe Glu Gin Glu Phe 
530 535 540 

Leu Asn Tyr Met Met Lys Asn Tyr Asp Lys Ser Leu Arg Lys Tyr 
545 550 555 

Phe Trp Thr Gly Leu Arg Asp Pro Asp Ser Arg Gly Glu Tyr Ser 
560 565 570 

Trp Ala Val Ala Gin Gly Val Lys Gin Ala Val Thr Phe Ser Asn 
575 580 585 

Trp Asn Phe Leu Glu Pro Ala Ser Pro Gly Gly Cys Val Ala Met 
590 595 600 

Ser Thr Gly Lys Thr Leu Gly Lys Trp Glu Val Lys Asn Cys Arg 
605 610 615 

Ser Phe Arg Ala Leu Ser He Cys Lys Lys Val Ser Glu Pro Gin 
620 625 630 

Glu Pro Glu Glu Ala Ala Pro Lys Pro Asp Asp Pro Cys Pro Glu 
635 640 645 

Gly Trp His Thr Phe Pro Ser Ser Leu Ser Cys Tyr Lys Val Phe 
650 655 660 

His lie Glu Arg He Val Arg Lys Arg Asn Trp Glu Glu Ala Glu 
665 670 675 

Arg Phe Cys Gin Ala Leu Gly Ala His Leu Pro Ser Phe Ser Arg 
680 685 690 

Arg Glu Glu He Lys Asp Phe Val His Leu Leu Lys Asp Gin Phe 
695 700 705 

Ser Gly Gin Arg Trp Leu Trp He Gly Leu Asn Lys Arg Ser Pro 
710 715 720 

Asp Leu Gin Gly Ser Trp Gin Trp Ser Asp Arg Thr Pro Val Ser 
\ 725 730 735 

Ala Val Met Met Glu Pro Glu Phe Gin Gin Asp Phe Asp He Arg 
740 745 750 

Asp Cys Ala Ala He Lys Val Leu Asp Val Pro Trp Arg Arg Val 
755 760 765 

5 Trp His Leu Tyr Glu Asp Lys Asp Tyr Ala Tyr Trp Lys Pro Phe 

770 775 780 

Ala Cys Asp Ala Lys Leu Glu Trp Val Cys Gin He Pro Lys Gly 
785 790 795 

Ser Thr Pro Gin Met Pro Asp Trp Tyr Asn Pro Glu Arg Thr Gly 
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lie His Gly Pro Pro Val lie lie Glu Gly Ser Glu Tyr Trp Phe 
815 820 825 

Val Ala Asp Pro His Leu Asn Tyr Glu Glu Ala Val Leu Tyr Cys 
5 830 835 840 

Ala Ser Asn His Ser Phe Leu Ala Thr lie Thr Ser Phe Thr Gly 
845 850 855 

Leu Lys Ala lie Lys Asn Lys Leu Ala Asn lie Ser Gly Glu Glu 
860 865 870 

10 Gin Lys Trp Trp Val Lys Thr Ser Glu Asn Pro lie Asp Arg Tyr 

875 880 885 

Phe Leu Gly Ser Arg Arg Arg Leu Trp His His Phe Pro Met Thr 
890 895 900 

Phe Gly Asp Glu Cys Leu His Met Ser Ala Lys Thr Trp Leu Val 
15 905 910 915 

Asp Leu Ser Lys Arg Ala Asp Cys Asn Ala Lys Leu Pro Phe lie 
920 925 930 

Cys Glu Arg Tyr Asn Val Ser Ser Leu Glu Lys Tyr Ser Pro Asp 
935 940 945 

20 Pro Ala Ala Lys Val Gin Cys Thr Glu Lys Trp lie Pro Phe Gin 

950 955 960 

Asn Lys Cys Phe Leu Lys Val Asn Ser Gly Pro Val Thr Phe Ser 
965 970 975 

Gin Ala Ser Gly lie Cys His Ser Tyr Gly Gly Thr Leu Pro Ser 
25 980 985 990 

Val Leu Ser Arg Gly Glu Gin Asp Phe lie lie Ser Leu Leu Pro 
995 1000 1005 

Glu Met Glu Ala Ser Leu Trp lie Gly Leu Arg Trp Thr Ala Tyr 
1010 1015 1020 

30 Glu Arg lie Asn Arg Trp Thr Asp Asn Arg Glu Leu Thr Tyr Ser 

1025 1030 1035 

Asn Phe His Pro Leu Leu Val Gly Arg Arg Leu Ser lie Pro Thr 
1040 1045 1050 

Asn Phe Phe Asp Asp Glu Ser His Phe His Cys Ala Leu lie Leu 
35 1055 1060 1065 

Asn Leu Lys Lys Ser Pro Leu Thr Gly Thr Trp Asn Phe Thr Ser 
1070 1075 1080 

Cys Ser Glu Arg His Ser Leu Ser Leu Cys Gin Lys Tyr Ser Glu 
1085 1090 1095 
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Thr Glu Asp Gly Gin Pro Trp Glu Asn Thr Ser Lys Thr Val Lys 

1100 1105 1110 

Tyr Leu Asn Asn Leu Tyr Lys lie lie Ser Lys Pro Leu Thr Trp 

1115 1120 1125 

5 His Gly Ala Leu Lys Glu Cys Met Lys Glu Lys Met Arg Leu Val 

1130 1135 1140 

Ser lie Thr Asp Pro Tyr Gin Gin Ala Phe Leu Ala Val Gin Ala 

1145 1150 1155 

Thr Leu Arg Asn Ser Ser Phe Trp lie Gly Leu Ser Ser Gin Asp 

10 1160 1165 1170 

Asp Glu Leu Asn Phe Gly Trp Ser Asp Gly Lys Arg Leu Gin Phe 

1175 1180 1185 

Ser Asn Trp Ala Gly Ser Asn Glu Gin Leu Asp Asp Cys Val lie 

1190 1195 1200 

15 Leu Asp Thr Asp Gly Phe Trp Lys Thr Ala Asp Cys Asp Asp Asn 

1205 1210 1215 

Gin Pro Gly Ala lie Cys Tyr Tyr Pro Gly Asn Glu Thr Glu Glu 

1220 1225 1230 

Glu Val Arg Ala Leu Asp Thr Ala Lys Cys Pro Ser Pro Val Gin 

20 1235 1240 1245 

Ser Thr Pro Trp lie Pro Phe Gin Asn Ser Cys Tyr Asn Phe Met 

1250 1255 1260 

lie Thr Asn Asn Arg His Lys Thr Val Thr Pro Glu Glu Val Gin 

1265 1270 1275 

25 Ser Thr Cys Glu Lys Leu His Pro Lys Ala His Ser Leu Ser lie 

1280 1285 1290 

Arg Asn Glu Glu Glu Asn Thr Phe Val Val Glu Gin Leu Leu Tyr 

1295 1300 1305 

Phe Asn Tyr lie Ala Ser Trp Val Met Leu Gly lie Thr Tyr Glu 

30 1310 1315 1320 

Asn Asn Ser Leu Met Trp Phe Asp Lys Thr Ala Leu Ser Tyr Thr 

1325 1330 1335 

His Trp Arg Thr Gly Arg Pro Thr Val Lys Asn Gly Lys Phe Leu 

1340 1345 1350 

35 Ala Gly Leu Ser Thr Asp Gly Phe Trp Asp lie Gin Ser Phe Asn 

1355 1360 1365 

Val lie Glu Glu Thr Leu His Phe Tyr Gin His Ser lie Ser Ala 

1370 1375 1380 

Cys Lys lie Glu Met Val Asp Tyr Glu Asp Lys His Asn Tyr Thr 
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1385 1390 1395 

Gly He Ala He Leu Phe Ala Val Leu Cys Leu Leu Gly Leu He 

1*00 1405 * 1410 

Ser Leu Ala He Trp Phe Leu Leu Gin Arg Ser His He Arg Trp 



5 1415 1420 



1425 



Thr Gly Phe Ser Ser Val Arg Tyr Glu His Gly Thr Asn Glu Asp 
1430 143S 1440 

Glu Val Met Leu Pro Ser Phe His Asp 
1445 1449 

10 (2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 87 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Val Gin Trp Leu Ala Met Leu Gin Leu Leu Trp Leu Gin Gin 
15 10 15 

Leu Leu Leu Leu Gly He His Gin Gly He Ala Gin Asp Leu Thr 
20 25 30 

20 His He Gin Glu Pro Ser Leu Glu Trp Arg Asp Lys Gly He Phe 

35 40 " 45 

He He Gin Ser Glu Ser Leu Lys Thr Cys He Gin Ala Gly Lys 
50 55 eo 



25 



Ser Val Leu Thr Leu Glu Asn Cys Lys Gin Pro Asn Glu His Met 
65 70 75 

Leu Trp Lys Trp Val Ser Asp Asp His Leu Phe Asn Val Gly Gly 
80 85 90 

Ser Gly Cys Leu Gly Leu Asn He Ser Ala Leu Glu Gin Pro Leu 
95 100 105 

30 Lys Leu Tyr Glu Cys Asp Ser Thr Leu He Ser Leu Arg Trp His 

110 H5 120 

Cys Asp Arg Lys Met He Glu Gly Pro Leu Gin Tyr Lys Val Gin 
125 130 135 



35 



Val Lys Ser Asp Asn Thr Val Val Ala Arg Lys Gin He His Arg 



140 i 4 5 



150 



Trp He Ala Tyr Thr Ser Ser Gly Gly Asp He Cys Glu His Pro 
155 160 i 6 5 



Ser Arg Asp Leu Tyr Thr Leu Lys Gly Asn Ala His Gly Met Pro 

180 



170 i 7 5 
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Cys Val Phe Pro Phe Gin Phe Lys Gly His Trp His His Asp Cys 
185 190 195 

lie Arg Glu Gly Gin Lys Glu His Leu Leu Trp Cys Ala Thr Thr 
200 205 210 

5 Ser Arg Tyr Glu Glu Asp Glu Lys Trp Gly Phe Cys Pro Asp Pro 

215 220 225 

Thr Ser Met Lys Val Phe Cys Asp Ala Thr Trp Gin Arg Asn Gly 
230 235 240 

Ser Ser Arg lie Cys Tyr Gin Phe Asn Leu Leu Ser Ser Leu Ser 
10 245 250 255 

Trp Asn Gin Ala His Ser Ser Cys Leu Met Gin Gly Gly Ala Leu 
260 265 270 

Leu Ser lie Ala Asp Glu Asp Glu Glu Asp Phe lie Arg Lys His 
275 280 285 

15 Leu Ser Lys Val Val Lys Glu Val Trp lie Gly Leu Asn Gin Leu 

290 295 300 

Asp Glu Lys Ala Gly Trp Gin Trp Ser Asp Gly Thr Pro Leu Ser 
305 310 315 

Tyr Leu Asn Trp Ser Gin Glu lie Thr Pro Gly Pro Phe Val Glu 
20 320 325 330 

His His Cys Gly Thr Leu Glu Val Val Ser Ala Ala Trp Arg Ser 
335 340 345 

Arg Asp Cys Glu Ser Thr Leu Pro Tyr lie Cys Lys Arg Asp Leu 
350 355 360 

25 Asn His Thr Ala Gin Gly lie Leu Glu Lys Asp Ser Trp Lys Tyr 

365 370 375 

His Ala Thr His Cys Asp Pro Asp Trp Thr Pro Phe Asn Arg Lys 
380 385 390 

Cys Tyr Lys Leu Lys Lys Asp Arg Lys Ser Trp Leu Gly Ala Leu 
30 395 400 405 

His Ser Cys Gin Ser Asn Asp Ser Val Leu Met Asp Val Ala Ser 
410 415 420 

Leu Ala Glu Val Glu Phe Leu Val Ser Leu Leu Arg Asp Glu Asn 
425 430 435 

35 Ala Ser Glu Thr Trp lie Gly Leu Ser Ser Asn Lys lie Pro Val 

440 445 450 

Ser Phe Glu Trp Ser Ser Gly Ser Ser Val lie Phe Thr Asn Trp 
455 460 465 

Tyr Pro Leu Glu Pro Arg lie Leu Pro Asn Arg Arg Gin Leu Cys 
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Val Ser Ala Glu Glu Ser Asp Gly Arg Trp Lys Val Lys Asp Cys 
485 490 495 

Lys Glu Arg Leu Phe Tyr lie Cys Lys Lys Ala Gly Gin Val Pro 
5 500 505 510 

Ala Asp Glu Gin Ser Gly Cys Pro Ala Gly Trp Glu Arg His Gly 
515 520 525 

Arg Phe Cys Tyr Lys lie Asp Thr Val Leu Arg Ser Phe Glu Glu 
530 535 540 

10 Ala Ser Ser Gly Tyr Tyr Cys Ser Pro Ala Leu Leu Thr lie Thr 

545 550 555 

Ser Arg Phe Glu Gin Ala Phe lie Thr Ser Leu lie Ser Ser Val 
560 565 570 

Ala Glu Lys Asp Ser Tyr Phe Trp He Ala Leu Gin Asp Gin Asn 
15 575 580 585 

Asn Thr Gly Glu Tyr Thr Trp Lys Thr Val Gly Gin Arg Glu Pro 
590 595 600 

Val Gin Tyr Thr Tyr Trp Asn Thr Arg Gin Pro Ser Asn Arg Gly 
605 610 615 

20 Gly Cys Val Val Val Arg Gly Gly Ser Ser Leu Gly Arg Trp Glu 

620 625 630 

Val Lys Asp Cys Ser Asp Phe Lys Ala Met Ser Leu Cys Lys Thr 
635 640 645 

Pro Val Lys He Trp Glu Lys Thr Glu Leu Glu Glu Arg Trp Pro 
25 650 655 660 

Phe His Pro Cys Tyr Met Asp Trp Glu Ser Ala Thr Gly Leu Ala 
665 670 675 

Ser Cys Phe Lys Val Phe His Ser Glu Lys Val Leu Met Lys Arg 
680 685 690 

30 Ser Trp Arg Glu Ala Glu Ala Phe Cys Glu Glu Phe Gly Ala His 

695 700 705 

Leu Ala Ser Phe Ala His He Glu Glu Glu Asn Phe Val Asn Glu 
710 715 720 

Leu Leu His Ser Lys Phe Asn Trp Thr Gin Glu Arg Gin Phe Trp 
35 725 730 735 

lie Gly Phe Asn Arg Arg Asn Pro Leu Asn Ala Gly Ser Trp Ala 
740 745 " ~ 750 

Trp Ser Asp Gly Ser Pro Val Val Ser Ser Phe Leu Asp Asn Ala 
755 760 765 
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Tyr Phe Glu Glu Asp Ala Lys Asn Cys Ala Val Tyr Lys Ala Asn 
770 775 780 

Lys Thr Leu Leu Pro Ser Asn Cys Ala Ser Lys His Glu Trp lie 
785 790 795 

5 Cys Arg lie Pro Arg Asp Val Arg Pro Lys Phe Pro Asp Trp Tyr 

800 805 810 

Gin Tyr Asp Ala Pro Trp Leu Phe Tyr Gin Asn Ala Glu Tyr Leu 
815 820 825 

Phe His Thr His Pro Ala Glu Trp Ala Thr Phe Glu Phe Val Cys 
10 830 835 840 

Gly Trp Leu Arg Ser Asp Phe Leu Thr lie Tyr Ser Ala Gin Glu 
845 850 855 

Gin Glu Phe lie His Ser Lys He Lys Gly Leu Thr Lys Tyr Gly 
860 865 870 

15 Val Lys Trp Trp He Gly Leu Glu Glu Gly Gly Ala Arg Asp Gin 

875 880 885 

lie Gin Trp Ser Asn Gly Ser Pro Val He Phe Gin Asn Trp Asp 
890 895 900 

Lys Gly Arg Glu Glu Arg Val Asp Ser Gin Arg Lys Arg Cys Val 
20 905 910 915 

Phe lie Ser Ser He Thr Gly Leu Trp Gly Thr Glu Asn Cys Ser 
920 925 930 

Val Pro Leu Pro Ser He Cys Lys Arg Val Lys He Trp Val He 
935 940 945 

25 Glu Lys Glu Lys Pro Pro Thr Gin Pro Gly Thr Cys Pro Lys Gly 

950 955 960 

Trp Leu Tyr Phe Asn Tyr Lys Cys Phe Leu Val Thr He Pro Lys 
965 970 975 

Asp Pro Arg Glu Leu Lys Thr Trp Thr Gly Ala Gin Glu Phe Cys 
30 980 985 990 

Val Ala Lys Gly Gly Thr Leu Val Ser He Lys Ser Glu Leu Glu 
995 1000 1005 

Gin Ala Phe He Thr Met Asn Leu Phe Gly Gin Thr Thr Asn Val 
1010 1015 1020 

35 Trp He Gly Leu Gin Ser Thr Asn His Glu Lys Trp Val Asn Gly 

1025 1030 1035 

Lys Pro Leu Val Tyr Ser Asn Trp Ser Pro Ser Asp He He Asn 
1040 1045 1050 

He Pro Ser Tyr Asn Thr Thr Glu Phe Gin Lys His He Pro Leu 
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Cys Ala Leu Met Ser Ser Asn Pro Asn Phe His Phe Thr Gly Lys 
1070 1075 1080 

Trp Tyr Phe Asp Asp Cys Gly Lys Glu Gly Tyr Gly Phe Val Cys 
5 1085 1090 1095 

Glu Lys Met Gin Asp Thr Leu Glu His His Val Asn Val Ser Asp 
1100 1105 1110 

Thr Ser Ala lie Pro Ser Thr Leu Glu Tyr Gly Asn Arg Thr Tyr 
1115 1120 1125 

10 Lys lie lie Arg Gly Asn Met Thr Trp Tyr Ala Ala Gly Lys Ser 

1130 1135 1140 

Cys Arg Met His Arg Ala Glu Leu Ala Ser lie Pro Asp Ala Phe 
1145 1150 1155 

His Gin Ala Phe Leu Thr Val Leu Leu Ser Arg Leu Gly His Thr 
15 1160 1165 1170 

His Trp lie Gly Leu Ser Thr Thr Asp Asn Gly Gin Thr Phe Asp 
1175 1180 1185 

Trp Ser Asp Gly Thr Lys Ser Pro Phe Thr Tyr Trp Lys Asp Glu 
1190 1195 1200 

20 Glu Ser Ala Phe Leu Gly Asp Cys Ala Phe Ala Asp Thr Asn Gly 

1205 1210 1215 

Arg Trp His Ser Thr Ala Cys Glu Ser Phe Leu Gin Gly Ala lie 
1220 1225 1230 

Cys His Val Val Thr Glu Thr Lys Ala Phe Glu His Pro Gly Leu 
25 1235 1240 1245 

Cys Ser Glu Thr Ser Val Pro Trp lie Lys Phe Lys Gly Asn Cys 
1250 1255 1260 

Tyr Ser Phe Ser Thr Val Leu Asp Ser Arg Ser Phe Glu Asp Ala 
1265 1270 1275 

30 His Glu Phe Cys Lys Ser Glu Gly Ser Asn Leu Leu Ala lie Arg 

1280 1285 1290 

Asp Ala Ala Glu Asn Ser Phe Leu Leu Glu Glu Leu Leu Ala Phe 
1295 1300 1305 

Gly Ser Ser Val Gin Met Val Trp Leu Asn Ala Gin Phe Asp Asn 
35 1310 1315 1320 

Asn Asn Lys Thr Leu Arg Trp Phe Asp Gly Thr Pro Thr Glu Gin 
1325 1330 1335 

Ser Asn Trp Gly Leu Arg Lys Pro Asp Met Asp His Leu Lys Pro 
1340 1345 1350 
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His Pro Cys Val Val Leu Arg lie Pro Glu Gly lie Trp His Phe 
1355 1360 1365 

Thr Pro Cys Glu Asp Lys Lys Gly Phe lie Cys Lys Met Glu Ala 
1370 1375 1380 

5 Gly lie Pro Ala Val Thr Ala Gin Pro Glu Lys Gly Leu Ser His 

1385 1390 1395 

Ser lie Val Pro Val Thr Val Thr Leu Thr Leu lie lie Ala Leu 
1400 1405 1410 

Gly lie Phe Met Leu Cys Phe Trp lie Tyr Lys Gin Lys Ser Asp 
10 1415 1420 1425 

lie Phe Gin Arg Leu Thr Gly Ser Arg Gly Ser Tyr Tyr Pro Thr 
1430 1435 1440 

Leu Asn Phe Ser Thr Ala His Leu Glu Glu Asn lie Leu lie Ser 
1445 1450 1455 

15 Asp Leu Glu Lys Asn Thr Asn Asp Glu Glu Val Arg Asp Ala Pro 

1460 1465 1470 

Ala Thr Glu Ser Lys Arg Gly His Lys Gly Arg Pro He Cys He 
1475 1480 1485 

Ser Pro 
20 1487 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 7 amino acids 

(B) TYPE: Amino Acid 
25 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Thr Tyr Asp Glu Ala Ser Ala Tyr Cys Gin Gin Arg Tyr Thr 
15 10 15 

His Leu Val Ala lie Gin Asn Lys Glu Glu He Glu Tyr Leu Asn 
30 20 25 30 

Ser He Leu Ser Tyr Ser Pro Ser Tyr Tyr Trp He Gly He Arg 
35 40 45 

Lys Val Asn Asn Val Trp Val Trp Val Gly Thr Gin Lys Pro Leu 
50 55 60 

35 Thr Glu Glu Ala Lys Asn Trp 

65 67 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 67 amino acids 
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(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Leu Lys Trp Ser Glu Ala Gin Phe Ser Cys Glu Gin Gin Glu Ala 
5 1 5 10 15 

Gin Leu Val Thr lie Thr Asn Pro Leu Glu Gin Ala Phe He Thr 
20 25 30 

Ala Ser Leu Pro Asn Val Thr Phe Asp Leu Trp He Gly Leu His 
35 40 45 

10 Ala Ser Gin Arg Asp Phe Gin Trp Val Glu Gin Glu Pro Leu Met 

50 55 60 

Tyr Ala Asn Trp Ala Thr Trp 
65 67 

(2) INFORMATION FOR SEQ ID NO: 10: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CCGGAATTCC GGTTTGTTGC CACTGGGAGC AGG 3 3 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 3 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCCAAGCTTG AAGTGGT C AG AGGCACAGTT CTC 3 3 
30 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GACGGGCCTG GCTGCGTTCC AGGAGGCCG 2 9 

(2) INFORMATION FOR SEQ ID NO: 13: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GAGGCCCAGC TGGGGGCCGG TGCTGGAGT 29 

(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 30 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



15 GGGTGGAGCA GGAGCCTTTG ATGTATGCCA 3 0 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: Nucleic Acid 
20 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



TTTCAGGTCC AGGGC CAGGA ACCCCAGAGC 3 0 
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Claims: 

1 . An isolated type C lectin selected from the group consisting of 

(1) a polypeptide comprising the amino acid sequence shown in Figure 2 (SEQ. ID. NO: 2); 

(2) a polypeptide comprising the amino acid sequence shown in Figure 9 (SEQ. ID. NO: 4); 
5 (3) a further mammalian homologue of polypeptide (1) or (2): 

(4) a soluble form of any of the polypeptides ( 1 ) - (3) devoid of an active transmembrane domain; 

and 

(5) a derivative of any of the polypeptides ( 1 ) - (3), retaining the qualitative carbohydrate recognition 
properties of a polypeptide (1), (2) or (3). 

10 2 The type C lectin of claim 1 having at least about 60% sequence identity with the amino acid 

sequence shown in Figure 1 (SEQ. ID. NO: 2) or Figure 9 (SEQ. ID. NO: 4). 

3. The type C lectin of claim 1 having at least about 80% sequence identity with the amino acid 
sequence shown in Figure 1 (SEQ. ID. NO:2) or Figure 9 (SEQ. ID. NO: 4). 

4. The type C lectin of claim 1 having at least about 80% sequence identity with the first three lectin 
15 domains of the amino acid sequence shown in Figure 1 (SEQ. ID. NO: 2) or Figure 9 (SEQ. ID. NO: 4). 

5. The type C lectin of claim 1 having at least about 80% sequence identity with the fibronectin type 
II domain of the amino acid sequence shown in Figure 1 (SEQ. ID. NO: 2) or Figure 9 (SEQ. ID. NO: 4). 

6. The type C lectin of claim 1 which is devoid of an active transmembrane domain and/or a 
cytoplasmic domain. 

20 7. The type C lectin of claim 1 unaccompanied by native glycosylation. 

8. The type C lectin of claim 1 which has a variant glycosylation. 

9. An antagonist of the type C lectin of claim 1 . 

10. A nucleic acid molecule encoding the type C lectin of claim 1 . 

1 1 . The nucleic acid molecule of claim 10 encoding at least the fibronectin type II domain and the 
25 first three lectin domains of a type C lectin having the amino acid sequence shown in Figure 1 (SEQ. ID. NO: 

2) or Figure 9 (SEQ. ID. NO: 4). 

12. The nucleic acid molecule of claim 10 encoding a type C lectin devoid of an active 
transmembrane domain and/or a cytoplasmic domain. 

13. A vector comprising the nucleic acid molecule of claim 10 operably linked to control sequences 
30 recognized by a host cell transformed with the vector. 

14. A host cell transformed with the vector of claim 1 3 

15. The host cell of claim 1 4 which is a mammalian cell. 

1 6. The host cell of claim 1 4 which is a Chinese hamster ovary cell line. 

17. A process for producing the type C lectin of claim 1 which comprises transforming a host cell 
35 with nucleic acid encoding said type C lectin, culturing the transformed cell and recovering said type C lectin 

from the cell culture. 

18. The process of claim 17 wherein said type C lectin is secreted into the culture medium and 
recovered from the culture medium. 

19. An antibody capable of specific binding to the type C lectin of claim 1 . 



-75- 



WO 97/40154 PCT/US97/06347 

20. A hybridoma cell line producing the antibody of claim 10. 

21. An immunoadhesin comprising an amino acid sequence of a type C lectin according to claim 1 
fused to an immunoglobulin sequence. 

22. The immunoadhesin of claim 21 comprising at least the fibronectin type II domain and a 
5 carbohydrate recognition domain of a polypeptide having the amino acid sequence shown in Figure 2 (SEQ. ID. 

NO: 2) or Figure 9 (SEQ. ID. NO: 4). 

23. The immunoadhesin of claim 21 wherein said immunoglobulin sequence is an immunoglobulin 
heavy chain constant domain sequence. 

24. The immunoadhesin of claim 23 wherein said immunoglobulin sequence is a constant domain 
10 sequence of an IgG-1, IgG-2 or IgG-3. 



-76- 



WO 97/40154 



PCTVUS97/06347 



elam 
T11885 



1 MTYD 
1 L K W S 



E A S AY 
EAQFS 



QQ 



QQ 



R Y T H 
E A Q 



L V A 
L V T 



N 
T N 



K E 
P L 



IEYLNSI 
Q A F I T A S 



S Y S P S Y Y 
PNVTFDL 



WIG 
WIG 



I RKVNNV 
LHASQRD 



elam 51 WV 
T11685 51 F Q 



W 
W V 



VG 



TQKPLTEEAKN 
EQEPLMYANWAT 



Figure 1 



1/17 



WO 97/40154 



PCT/US97/06347 



1 GAATTCGGCT TCCATCCTCA TACGACTCAC TATAGGGCTC GAGCGCCGCC CGGGCAGGTC GCCGGCGGTC 
71 ATCCGAGCAC AGCGCTAGGG CTGTCTCTGC ACGCAGCCCT GCCGTGCGCC CTCCGTACTC TCGTCCTCCG 
141 AGCGCCGCAG GGATGGTACC CATCCGACCT JCCCTCGCGC CCTGGCCTCG TCACCTGCTG CGCTGCGTCT 

211 TGCTTCTCGG GGGACTGCGT CTCQGCCACC CGGCGGACTC CGCCGCCGCC CTCCTGGAGC C73ATGTCTT 

^ o n r A v S AAA LLEP D V F 
281 CCTCATCTTC JGCCAGGGGA TGCAGGGCTG TCTGGAGGCC CAGGGTGTGC AGGTCCGAGT CACCCCATTC 



V R V T p F 



351 TGCAATGCCA GTCTCCCTGC CCAGCGCTGG JAGTGGGTCT CCCGGAACCG ACTCTTCAAC CTGGGTGCCA 
421 CACAGTGCCT GGGTACAGGC TGGCCAGTCA CCAACACCAC AG^CCTTG OOatLL AGTGTGAC J 
491 AGAGGCCTTG AGTCTTCGAT GGCAGTGTTC CTACACTAGG GGACCAGTTG TCCCTGCTTC TGGGGGCTCG 
561 TGCAAGCAAT GCATCCAAGC CTGGCACCTG GAGCGCGGTG ACCAGACCCG CAG™ JjLLL 
631 ATGGCAGTGA AGAAGACCTA TGTGCTCGAC CTTACTATGA GGTCTACACC ATCCAGGGAA ACTCAcLI 
701 AAAGCCGTGC ACTATCCCCT TCAAATACGA CAACCAGTGG TTCCACGGCT GCACCAGCAC TGGCAGAGAA 

2 7 o7 ? a T g S ac l CACCACCCAG ? A T C&GCA ^ at e agcg c J GGGGC r= A* c f CA 

841 AGAGTAACGA CTGTGAGACC TTCTGGGACA AAGACCAGCT GACTJACAGC TGTTACCAGT TTAACTTCCA 

911 ATCCACACTG TCCTGGAGGG AGGCCTGGGC CAGCTGCGAG CAGCAGGGTG CAGACTTGCT GAGTATCACG 

~ w £>ut. QQGA DLL SIT 

981 GAGATCCACG AGCAGACCTA CATCAACGGG CTCCTCACGG GCTACAGCTC CACGCTATGG ATTGGCCTTA 

1051 ATGACCTGGA TACCAGTGGA GGCTGGCAGT GGTCAGACAA CTCACCCCTC AAGTACCTCA ACTGGGAGAG 

1121 TGATCAGCCG GACAACCCAG GTGAGGAGAA CTGTGGAGTG ATCCGGACTG AGTCCTCAGG CGGCTGGCAG 

"Ji ° C t! K T¥ TTT T 5"?^ AGAAACCCAA CGCTACGGTC GAGCCCATCC 

*^ KPN A T V EPIQ 

1261 AGCCAGACCG GTGGACCAAT GTCAAGGTGG AATGTGACCC CAGCTGGCAG CCCTTCCAGG GCCACTGCTA 
"SI TTT G f CGA6AAGC GCAG T GCA G fT CAAG ^ CG ? G T TOCG ^G GgLacCTC 

V o is. l\ A C L R G G 



G D L 



1401 CTTAGCA^C ACAGCATGGC TGAGCTGGAG TTCATCACCA AACAGATCAA GCAAGAGGTG GAGGAGCTAT 
1471 GGATTGGCCT CAATGATTTG AAACTGCAGA TGAATTTTGA GTGGTCCGAC «oJccL TGAGCT^CAC 
TT^ 0 ?m CCAACAA T T f TGAC f ^ J TG ^CT GTGTCACCAT CttLLcL 
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1611 GAAGGACGCT GGAACGACAG TCCCTGTAAC CAGTCCTTGC CATCCATTTG CAAGAAGGCA G3CCGGCTGA 
487 EGRW N D S PCN QSLP SIC KKA GRLS 

1681 GCCAGGGCGC TGCGGAGGAG GACCACGACT GCCGGAAGGG TTGGACGTGG CATAGCCCAT CC7GCTACTG 
511 QGA AEE DHDC RKG WTW HSPS C Y W 

1751 GCTGGGAGAG GACCAAGTGA TCTACAGTGA TGCCCGGCGC CTGTGTACTG ACCATGGCTC 7C-GCTGGTC 
534 LGE DQVI YSD ARR LCTD HGS C L V 

1821 ACCATCACCA ACAGGTTTGA GCAGGCCTTC GTCAGCAGCC TCATCTATAA CTGGGAGGGC G^^ACTTCT 
557 TITN RFE QAF VSSL IYN WEG EYFW 

1891 GGACAGCCCT GCAAGACCTC AACAGTACTG GCTCCTTCCG TTGGCTCAGT GGGGATGAAG ^CATATATAC 
581 TAL Q D I* NSTG SFR WLS GDEV IYT 

1961 CCATTGGAAT CGAGACCAGC CTGGGTACAG ACGTGGAGGC TGTGTGGCTC TGGCCACTGG CAGTGCCATG 
604 HWN RDQP GYR RGG CVAL ATG SAM 

2031 GGACTGTGGG AGGTGAAGAA CTGCACATCG TTCCGGGCTC GCTACATCTG CCGACAGAGC CTGGGCACAC 
627 GLWE VKN CTS FRAR YIC RQS LGTP 

2101 CGGTCACACC AGAGCTGCCT GGGCCAGACC CCACGCCCAG CCTCACTGGC TCCTGTCCCC kGGGCTGGGT 
651 VTP ELP GPDP TPS LTG SCPQ G W V 

2171 CTCAGACCCC AAACTCCGAC ACTGCTATAA GGTGTTCAGC TCAGAGCGGC TGCAGGAGAA GAAGAGTTGG 
674 SDP KLRH CYK VFS SERL QEK K S W 

2241 ATCCAGGCCC TGGGGGTCTG CCGGGAGTTG GGGGCCCAGC TGCTGAGTCT GGCCAGCTAT GAGGAG" - ~ " 
697 IQAL GVC REL GAQL LSL ASY EEEH 

2311 ACTTTGTGGC CCACATGCTC AACAAGATCT TTGGTGAGTC AGAGCCTGAG AGCCATGAGC AGCACTGGTT 
721 FVA HML NKIF GES EPE SHEQ HWF 

23 81 TTGGATTGGC CTGAACCGCA GAGACCCTAG AGAGGGTCAC AGCTGGCGCT GGAGCGACGG TCTAGGGTTT 
744 WIG LNRR DPR EGH SWRW SDG LGF 

24 51 TCCTACCACA ATTTTGCCCG GAGCCGACAT GATGACGATG ATATCCGAGG CTGTGCAGTG CTGGACCTGG 
767 SYHN FAR SRH DDDD I R G CAV LDLA 

2521 CCTCCCTGCA GTGGGTACCC ATGCAGTGCC AGACGCAGCT TGACTGGATC TGCAAGATCC CTAGAGGTGT 
791 SLQ WVP MQCQ TQL DWI CKIP RGV 

2591 GGATGTGCGG GAACCAGACA TTGGTCGACA AGGCCGTCTG GAGTGGGTAC GCTTTCAGGA GGCCGAGTAC 
814 DVR EPDI GRQ GRL EWVR FQE AEY 

2661 AAGTTTTTTG AGCACCACTC CTCGTGGGCG CAGGCACAGC GC ATCTGCAC CTGGTTCCAG GCAGATCTGA 
837 KFFE HHS SWA QAQR I C T WFQ ADLT 

2731 CCTCCGTTCA CAGCCAAGCA GAACTGGGCT TCCTGGGGCA AAACCTGCAG AAGCTGTCCT CAGACCAGGA 
861 SVH SQA ELGF LGQ NLQ KLSS DQE 

2801 GCAGCACTGG TGGATCGGCC TGCACACCTT GGAGAGTGAC GGACGCTTCA GGTGGACAGA TGGTTCTATT 
884 Q H W WIGL HTL ESD GRFR WTD GSI 

2871 ATAAACTTCA TCTCTTGGGC ACCGGGAAAA CCTAGACCCA TTGGCAAGGA CAAGAAGTGT GTATACATGA 
907 INFI SWA PGK PRPI GKD KKC VYMT 

2941 CAGCCAGACA AGAGGACTGG GGGGACCAGA GGTGCCATAC GGCTTTGCCC TACATCTGTA AGCGCAGCAA 
931 ARQ EDW GDQR CHT ALP YICK RSN 

3011 TAGCTCTGGA GAGACTCAGC CCCAAGACTT GCCACCTTCA GCCTTAGGAG GCTGCCCCTC CGGTTGGAAC 
954 SSG ETQP QDL PPS ALGG CPS GWN 

3081 CAGTTCCTCA ATAAGTGTTT CCGAATCCAG GGCCAGGACC CCCAGGACAG GGTGAAATGG TCAGAGGCAC 
977 QFLN KCF RIQ GQDP QDR VKW SEAQ 

3151 AGTTCTCCTG TGAACAGCAA GAAGCCCAGC TGGTCACCAT TGCAAACCCC TTAGGGCAAG CATTTATCAC 
1001 FSC EQQ EAQL VTI A N P LGQA FIT 
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3221 AGCCAGCCTC CCCAACGTGA CCTTTGACCT TTGGATTGGC CTGCATGCCT CTCAGAGGGA CTTCCAGTGG 
1024 ASL P N V T FDL WIG LHAS QRD F Q W 

3291 ATTGAACAAG AACCCCTGCT CTATACCAAC TGGGCACCAG GAGAGCCCTC TGGCCCCAGC CCTGCTGCCA 
1047 IEQE PLL YTN W A P G EPS GPS PAPS 

3 3 61 GTGGCACCAA GCCGACCAGC TGTGCGGTGA TCCTGCACAG CCCCTCAGCC CACTTCACTG GCCG2Z3GGK 
1071 GTK PTS CAVI LHS PSA HFTG R D 

3431 TGATCGGAGC TGCACAGAGG AGACGCATGG CTTCATCTGC CAGAAGGGCA CAGACCCCTC GCTAA 3 C C C A 
1094 DRS CTEE THG FIC QKGT DPS LSP 

3501 TCCCCAGCAG CAACACCCCC TGCCCCGGGC GCTGAGCTCT CCTATCTCAA CCACACCTTC CGGCTGCTGC 
1117 SPAA TPP APG A E L S YLN HTF RLLQ 

3571 AGAAGCCACT GCGCTGG AAA GATGCTCTCC TGCTGTGTGA GAGCCGAAAT GCCAGCCTGG CACACGTGCC 
1141 KPL RWK DALL LCE SRN A S L A HVP 

3 641 CGATCCCTAC ACACAAGCCT TCCTCACACA GGCTGCACGG GGGCTGCAAA CACCACTGTG GATCGGGCTG 
1164 DPY TQAF LTQ AAR GLQT PLW IGL 

3711 GCCAGTGAGG AGGGCTCACG GAGGTATTCC TGGCTCTCAG AGGAGCCTCT GAATTATGTG AGCTGGCAAG 
1187 ASEE GSR RYS W L S E EPL NYV SWQD 

37 81 ATGAGGAGCC CCAGCACTCG GGAGGCTGTG CCTACGTGGA TGTGGATGGA ACCTGGCGCA CCACGAGCTG 
1211 EEP QHS GGCA YVD VDG TWRT TSC 

3851 TGATACCAAG CTGCAGGGGG CAGTGTGTGG GGTGAGCAGG GGGCACCCAC CCCGAAGGAT AAACTAC CGI 
1234 DTK L Q G A VCG VSR GHPP RRI NYR 

3921 GGCAGCTGTC CTCAGGGCTT GGCTGACTCG TCCTGGATTC CCTTCAGGGA GCATTGCTAT TCTTTCCACA 
1257 GSCF QGL A D S, S W I P FRE HCY SFHM 

3991 TGGAGGTGCT GTTGGGCCAC AAGGAGGCGC TGCAGCGCTG TCAGAAAGCT GGTGGGACGG TTCTGTCCAT 
1281 EVL LGH KEAL QRC QKA GGTV LSI 

4061 TCTTGATGAG ATGGAGAATG TGTTTGTCTG GGAGCACCTG CAGACAGCTG AAGCCCAAAG TCGAGGTGCC 
1304 LDE MENV FVW EHL QTAE AQS RGA 

4131 TGGTTGGGCA TGAACTTCAA CCCCAAAGGA GGCACGCTGG TCTGGCAAGA CAACACAGCT GTGAACTATT 
1327 WLGM NFN PKG GTLV WQD NTA VNYS 

4201 CTAACTGGGG GCCCCCTGGC CTGGGCCCTA GCATGCTAAG CCACAACAGC TGCTACTGGA TCCAGAGCAG 
1351 NWG PPG LGPS MLS HNS CYWI QSS 

4271 CAGCGGACTG TGGCGCCCCG GGGCTTGTAC CAACATCACC ATGGGAGTTG TCTGCAAGCT CCCTAGAGTG 
1374 SGL WRPG ACT NIT MGVV CKL PRV 

4341 GAAGAGAACA GCTTCTTGCC ATCAGCAGCC CTCCCCGAGA GCCCGGTTGC CCTGGTGGTG GTGCTGACAG 
1397 EENS FLP S A A LPES PVA LVV VLTA 

4411 CGGTGCTGCT CCTCCTGGCC TTGATGACGG CAGCCCTCAT CCTCTACCGG CGCCGACAGA GTGCGGAGCG 
1421 VLL LLA LMTA A L I LYR RRQS AER 

4481 TGGGTCCTTC GAGGGGGCCC GCTACAGTCG CAGCAGCCAC TCTGGCCCCG CAGAGGCCAC CGAGAAGAAC 
1444 GSF EGAR YSR SSH SGPA EAT EKN 

4551 ATTCTGGTGT CTGACATGGA AATGAACGAA CAGCAAGAAT AGAGCCAAGG GCGTGGTCGG GGTGGAGCCA 
1467 ILVS DME MNE QQEO 

4621 AAGCGGGGGA GGCAGGCAGG GGTGGAGCCA GAGCGGGTAA GGCAGGGGCC CCAGGTCAGC AGGCCCCCAT 
4691 CACCCATCAG CCCAGTTGTC TTTGGATGGC AACCCTTGGG AGTTGCTACT GGGTGCCGGG GGCATAGCTT 
4761 GCCATGGGGT GGGAGTACCC AGCCTACCAT AGAGGCTAGG CTGAGACTTG GCAGTGGGTC ATGTTCCCCT 
4 831 TTCCCTTGGG CCTGGGATCG TGTCACCTGG ACCTGGACCC CATGGCAACT GGAGGCAATA TGAGAAGGGA 
4901 CATGAGCTTA TTCATGTCTT TTCCTCCCCA GATCCCTGAG CCTAAACCTG CTGACCTGCA GCCTAGGATT 
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4971 


CTTTCCTATC 


TGTAGGCCTG 


GAAAGCCTGC 


CCCGTCCCTT 


GGGGTGGCTC 


TCTGTCACCT 


CTCCTACTCG 


5041 


GCTACATCAG 


TTCTGTCTCC 


TCACCCTGCC 


CTCGTGCCTT 


TTTTTCCACC 


CAGTGCCTCC 




5111 


TGGCCCTGGG 


ACTTGGGTGA 


TCTCTCTCTC 


TCTCTCTCTC 


TCTCTCTCTC 


TCTCATTCTC 




5181 


TCTCTGGGTG 


GGGGTCAGCT 


GAAGAGGCTG 


GCCAAGCATC 


TGTCACTCCT 


GTGCCTGCTG 


GAAT33ACCT 


5251 


AGGGTATGGC 


AGGAGGGAGC 


CTAGGTGGCT 


CAGGTGTACA 


AACCAGGGCA 


CCGGTGTGGT 


GTCTGCTGGA 


5321 


GTAGAGATGG 


AACTTCGGAG 


AGACACCTTA 


TCCACTCACA 


GGGTGTCATC 


TCCTGCTGGT 


CAGGGGAGGG 


5391 


CTCTGTCCTT 


GAAAGAGTCC 


CCTGTGGGGA 


CCAAAATAAG 


TTCCCTAATG 


TCTCCGGCTT 


CTGGCTCTGG 


5461 


CTTGGAGAGA 


GGGAAGATGG 


TTTGGAGGGG 


GAGGGGCGCT 


GGTGAGGCTG 


TAACCTGGGA 


CAGCACCAGG 


5531 


TGCTACCATC 


TGGTGTGGCC 


TAGGAGACCA 


ACTCATGGAA 


CCGCTCAGCA 


CCTTTTTCCA 


GAGGAGAGTC 


5601 


CCAGCCAGGA 


TGGAGAGTGC 


CAGTCCCCGT 


GTCCCAGTGC 


AGGACGATGT 


GAACAAAAAC 


TCAAAGCGGA 


5671 


CCCTCTATTG 


TAGTTCTTGA 


CTCTCGAAAT 


GTGCTACTAT 


TGTTTGTCTT 


TT'T TT T TTT T 


TTTAAAGCCG 


5741 


GGAAAAGAGA 


AAAAGAATAG 


CCCCCAAATA 


AAAACCTTCC 


AGAGGCTTGA 


GAAGTCCAAA 


AAAAAAAAAA 


5811 


AAAAAAAGTC 


GACGCGGCCG 


CGAATTC 
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MVPIRPALAPWPRHLLRCVLLLGGLRLGHPADSAAALLEPDVFLIFSQGMQGCLEAQGVQ 
VRVTPVCNASLPAQRWKWVSRNRLFNLGATQCLGTGWPVTNTTVSLGMYECDREALSLRM 
AVS YTRGPWPASGGSCKQC I QAWHLERGDQTRSGHWNI YGSEEDLCARPYYEVYT I QGN 
SHGKPCTIPFKYDNQWFHGCTSTGREDGHLWCATTQDYGKDERWGFCPIKSNDCETFWDK 
DQLTDSCYQFNFQSTLSWREAWASCEQQGADLLSITEIHEQTYINGLLTGYSSTLWIGLN 
DLDT S GGWQWS DNS PLKYLNWESDQPDNPGEENCGVI RTES SGGWQNHDCS I ALPYVCKK 
KPNATVEPIQPDRWTNVKVECDPSWQPFQGHCYRLQAEKRSWQESKRACLRGGGDLLSIH 
SMAE LE F I TKQ I KQE VEE LW I GLNDLKLQMNFE W S DGS LVS FTHWHP FE PNNFRDS LE DC 
VTIWGPEGRWNDSPCNQSLPSICKKAGRLSQGAAEEDHDCRKGWTWHSPSCYWLGEDQVI 
YSDARRLCTDHGSQLVTITNRFEQAFVSSLIYNWEGEYFWTALQDLNSTGSFRWLSGDEV 
IYTHWNRDQPGYRRGGCVALATGSAMGLWEVKNCTSFRARYICRQSLGTPVTPELPGPDP 
TPSLTGSCPQGWVSDPKLRHCYKVFSSERLQEKKSWIQALGVCRELGAQLLSLASYEEEH 
FVAHMLNKIFGESEPESHEQHWFWIGLNRRDPREGHSWRWSDGLGFSYHNFARSRHDDDD 
IRGCAVLDLASLQWVPMQCQTQLDWICKIPRGVDVREPDIGRQGRLEWVRFQEAEYKFFE 
HHSSWAQAQRICTWFQADLTSVHSQAELGFLGQNLQKLSSDQEQHWWIGLHTLESDGRFR 
WTDGSIINFISWAPGKPRPIGKDKKCVYMTARQEDWGDQRCHTALPYICKRSNSSGETQP 
QDLPPSALGGCPSGWNQFLNKCFRIQGQDPQDRVKWSEAQFSCEQQEAQLVT IANPLEQA 
FI TAS LPNVT FDLW I GLHAS QRDFQW I EQE PLL YTNWAPGEPS GPS PAPS GTKPTS CAVI 
LHSPSAHFTGRWDDRSCTEETHGFICQKGTDPSLSPSPAATPPAPGAELSYLNHTFRLLQ 
KPLRWKDALLLCESRNASLAHVPDPYTQAFLTQAARGLQTPLWIGLASEEGSRRYSWLSE 
EPLNYVSWQDEEPQHSGGCAYVDVDGTWRTTSCDTKLQGAVCGVSRGPPPRRINYRGSCP 
QGLADS S W I P FREHC YS FHMEVLLGHKEALQRCQKAGGTVLS I LDEMENVFVWEHLQT AE 
AQSRGAWLGMNFNPKGGTLVWQDNTAVNYSNWGPPGLGPSMLSHNSCYWIQSSSGLWRPG 
ACTNITMGWCKLPRVEENSFLPSAALPESPVALVVVLTAVLLLLALMTAALILYRRRQS 
AERGSFEGARYSRSSHSGPAEATEKNILVSDMEMNEQQE 
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